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Abstract 

The  plenoptic  camera  enables  simultaneous  collection  of  imagery  and  depth  infor¬ 
mation  by  sampling  the  4D  light  field.  The  light  field  is  distinguished  from  data 
sets  collected  by  stereoscopic  systems  because  it  contains  images  obtained  by  an  N 
by  N  grid  of  apertures,  rather  than  just  the  two  apertures  of  the  stereoscopic  sys¬ 
tem.  By  adjusting  parameters  of  the  camera  construction,  it  is  possible  to  alter  the 
number  of  these  ‘subaperture  images,’  often  at  the  cost  of  spatial  resolution  within 
each.  This  research  examines  a  variety  of  methods  of  estimating  depth  by  deter¬ 
mining  correspondences  between  subaperture  images.  A  major  finding  is  that  the 
additional  ‘apertures’  provided  by  the  plenoptic  camera  do  not  greatly  improve  the 
accuracy  of  depth  estimation.  Thus,  the  best  overall  performance  will  be  achieved 
by  a  design  which  maximizes  spatial  resolution  at  the  cost  of  angular  samples.  For 
this  reason,  it  is  not  surprising  that  the  performance  of  the  plenoptic  camera  should 
be  comparable  to  that  of  a  stereoscopic  system  of  similar  scale  and  specifications.  As 
with  stereoscopic  systems,  the  plenoptic  camera  has  immediate  applications  in  the 
domains  of  robotic  navigation  and  3D  video  collection,  though  these  domains  may 
be  expanded  in  the  future  as  technological  advances  extend  the  range  over  which  the 
camera  accurately  recovers  depth. 
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RANGEFINDING  WITH  A  PLENOPTIC  CAMERA 


I.  Introduction 

Even  amidst  the  technological  marvels  of  the  21st  century,  the  human  vision  sys¬ 
tem  remains  arguably  the  most  impressive  of  its  kind  known  to  man.  Our  eyes  are 
integrated  together  with  a  multitude  of  systems  and  procedures  which  give  us  a  visual 
awareness  of  our  surroundings,  encompassing  aspects  like  structure,  depth,  motion, 
contiguousness,  texture,  and  more.  Though  no  computer  vision  system  may  ever  per¬ 
fectly  mimic  these  capabilities  without  major  breakthroughs  in  artificial  intelligence, 
certain  isolated  aspects  continue  to  move  within  the  grasp  of  modern  technology. 

Depth  perception  is  one  of  these  aspects.  In  the  human  vision  system,  depth 
information  is  obtained  through  a  variety  of  means.  Some  of  these  means,  such  as 
the  intelligent  evaluation  of  the  apparent  size  of  recognized  objects  or  other  aspects 
of  a  scene,  are  well  beyond  the  scope  of  this  thesis.  However,  other  methods  make  use 
of  a  simpler,  more  attainable  mechanism.  For  example,  the  displacement  between  a 
person’s  two  eyes  means  that  each  eye  renders  a  slightly  different  view  of  a  scene.  The 
brain  integrates  these  views  together  to  provide  a  single  image,  along  with  a  sense  of 
depth. 

The  information  afforded  in  this  manner  plays  an  important  part  in  how  we  in¬ 
teract  with  the  world,  as  is  clear  when  one  simply  closes  an  eye.  Upon  doing  so,  it 
immediately  becomes  more  difficult  to  ascertain  spatial  relationships  among  objects 
and  textures  in  a  scene  and,  in  short,  to  interact  with  one’s  surroundings. 

This  same  is  true  with  respect  to  platforms  employed  by  the  U.S.  Air  Force.  The 
coupling  of  depth  information  with  imagery  makes  each  more  usable.  In  the  context  of 
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remote  sensing,  depth  information  can  be  useful  in  the  automated  analysis  of  a  scene, 
as  it  provides  an  additional  channel  for  use  in  image  registration  and  segmentation. 
Depth  information  can  also  be  critical  for  understanding  the  dynamics  of  a  region 
and  preparing  operators  who  will  be  deploying  to  that  location.  Just  as  humans 
employ  depth  information  to  assist  in  movement  and  collision  avoidance,  so  depth 
information  can  assist  mobile  Air  Force  systems  in  performing  navigation.  Indeed, 
such  information  is  critical  for  any  autonomous  system  using  imagery  to  interact  with 
its  environment.  The  plenoptic  camera  is  of  interest  to  the  U.S.  Air  Force  because  it 
stands  to  provide  a  cheap  and  accurate  means  of  supplying  depth  information  in  real 
time  to  this  wide  variety  of  systems. 

Like  the  human  vision  system,  the  plenoptic  camera  relies  on  the  phenomenon  of 
parallax.  Parallax  refers  to  the  apparent  shift  in  an  object’s  location  with  respect 
to  its  background  and  foreground  when  viewed  along  different  lines  of  sight.  In  the 
context  of  computer  vision,  parallax  can  be  understood  as  the  fact  that,  given  two 
cameras  having  optical  axes  subject  to  relative  translation  and  rotation,  an  object’s 
imaged  location  will  shift  relative  to  the  optical  axis  in  a  depth-dependent  manner. 

In  its  dependence  on  the  parallax  effect,  the  plenoptic  camera  is  comparable  to  a 
slew  of  other  passive  ranging  technologies.  In  stereovision  systems,  the  parallax  effect 
is  quantified  in  the  disparity  between  the  pixel  location  of  an  object  between  two 
images  taken  from  slightly  different  angles  [1].  The  process  of  triangulation  is  used 
in  conjunction  with  this  information  to  provide  a  depth  estimate  for  an  object.  In  a 
similar  fashion,  structure  from  motion  (SFM)  techniques  determine  disparity  between 
images  taken  as  a  camera  moves  relative  to  a  scene  in  order  to  estimate  depth  [2], 
Insofar  as  more  than  two  views  of  an  object  are  available,  structure  from  motion 
techniques  provide  denser  sampling  than  stereoscopic  devices.  However,  knowledge 
of  the  change  in  camera  position  and  orientation  for  a  moving  platform  may  be 
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unknown  or  known  with  less  precision  than  the  relative  orientation  of  the  cameras  in 
stereoscopic  systems,  and  uncertainties  in  camera  location  will  introduce  error  into 
the  final  depth  estimation. 

A  less  intuitive  manifestation  of  the  parallax  phenomenon  involves  depth  estima¬ 
tion  from  defocus.  In  this  problem,  an  image  is  analyzed  to  determine  the  depth- 
dependent  circle  of  confusion  causing  blurring  at  each  image  point.  This  circle  of 
confusion  can  be  thought  of  as  resulting  from  the  difference  in  the  appearance  of  the 
scene  (resulting  from  the  parallax  effect)  when  viewed  through  different  portions  of  a 
camera’s  lens. 

Thinking  about  defocus  in  this  manner  is  a  helpful  primer  for  consideration  of 
the  plcnoptic  camera.  The  plenoptic  camera  incorporates  an  array  of  microlenses  in 
front  of  a  detector  array  in  order  to  separate  rays  incident  from  different  portions 
of  the  main  lens  [3].  Isolating  the  light  from  one  portion  of  the  main  lens  allows 
for  creation  of  a  ‘subaperture  image,’  i.e.,  an  image  appearing  as  if  taken  from  a 
small  subaperture  of  the  main  lens  [4].  The  collection  of  subaperture  images  can  be 
arranged  into  a  2D  array,  and  disparities  between  successive  images  used  to  calculate 
depth  in  a  manner  similar  to  structure  from  motion,  but  without  the  camera  position 
uncertainties  associated  with  that  technique.  Thus,  plcnoptic  camera  ranging  can  be 
usefully  thought  of  alternately  as  a  constrained  form  of  structure  from  motion,  or  as 
a  variant  of  the  depth  from  defocus  problem. 

The  dense  data  set  captured  by  the  plenoptic  camera,  consisting  of  a  collection  of 
images  of  an  object,  captured  from  an  N  by  M  grid  of  locations,  is  a  construct  that 
developed  within  the  image-based  rendering  community  under  the  title  of  the  ‘Light 
Field’  [5].  As  the  availability  of  plenoptic  cameras  for  easy  recording  of  the  light  held 
has  increased,  the  concept  of  light  held  has  seen  growing  interest  within  the  computer 
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vision  community,  and  numerous  approaches  and  algorithms  have  been  developed  for 
estimating  depth  from  the  sampled  light  held  [3]  [6]  [7]  [8]. 

The  purpose  of  this  research  is  to  provide  a  framework  for  thinking  about  and 
quantifying  the  ranging  capabilities  of  a  plenoptic  camera.  The  plcnoptic  camera 
design  contains  numerous  degrees  of  freedom  which  affect  different  aspects  of  its 
performance.  Some  of  these  effects  are  very  pronounced  and  others  subtle.  Depth 
estimation  accuracy  is  also  dependent  on  the  content  of  the  scene  being  imaged.  Pas¬ 
sive  ranging  systems  typically  have  trouble  with  regions  of  a  scene  barren  of  features, 
like  walls  in  a  building.  The  same  is  true  for  the  plcnoptic  camera,  which  gives  best 
performance  where  image  gradient  magnitudes  are  high. 

The  goal  of  this  research  is  to  provide  a  description  of  the  impact  of  these  various 
factors  on  the  plenoptic  camera’s  depth  resolving  performance.  This  involves  two 
major  areas  of  investigation.  The  first  area  of  investigation  concerns  the  sampling 
characteristics  of  a  plenoptic  camera.  Given  a  particular  plcnoptic  camera  geometry, 
how  does  this  geometry  sample  the  continuous  light  held?  What  will  the  sampled  light 
held  look  like  for  a  point  at  a  known  location  relative  to  the  camera?  Answering  these 
questions  requires  a  detailed  look  at  the  plenoptic  camera  geometry  and  sampling 
characteristics. 

Once  the  forward  process  of  light  held  sampling  has  been  defined,  the  range  find¬ 
ing  operation  is  simply  the  reverse  process  of  backing  out  the  location  of  a  point 
responsible  for  the  captured  light  held.  In  this  domain  we  ask  the  question,  how 
well  can  the  characteristic  shape  of  a  point  source  within  the  sampled  light  held  be 
identified  and  ht  to  a  model  which  then  yields  depth  information?  Answering  this 
question  requires  that  we  engage  with  the  modern  image  processing  techniques  which 
have  been  applied  to  light  held  imaging  and  ranging,  and  seek  to  understand  the 
sources  of  error  and  uncertainty  within  these  techniques. 
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The  overall  contribution  of  the  thesis  is  a  set  of  equations  which  define  the  perfor¬ 
mance  of  the  plcnoptic  camera  and  its  dependence  of  various  parameters  of  interest,  as 
well  as  empirical  testing  which  confirms  and/or  defines  the  scope  of  these  equations. 
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II.  Background 


Its  ability  to  perform  stereo  ranging  was  the  main  feature  noted  by  Adelson  and 
Wang  when  they  first  created  the  plcnoptic  camera  [3].  Observing  that  the  plenoptic 
camera  allowed  for  an  object  to  be  viewed  through  different  sections  or  sub-apertures 
of  the  main  camera  lens,  they  developed  an  algorithm  to  determine  depth  from  the 
resulting  parallax  shift  in  object  location.  Since  this  shift  is  manifest  as  a  sloping 
of  lines  when  neighboring  subaperture  images  are  stacked  on  top  of  each  other  (see 
Fig.  8),  their  algorithm  incorporated  image  gradient  to  estimate  the  slope  direction 
within  a  region  of  the  light  field. 

The  potential  to  perform  refocusing  using  light  fields  was  first  explored  by  Isaksen 
et  al.  in  the  context  of  image-based  rendering  [9].  The  refocusing  operation  consists 
of  nothing  more  than  a  shifted  superposition  of  subaperture  images  in  a  manner  which 
counteracts  the  parallax  effect  for  a  given  object  depth.  Not  until  the  construction 
of  a  hand-held  plenoptic  camera  by  Ng  et  al.  was  this  capability  demonstrated  in 
the  context  of  light  held  photography  with  a  plcnoptic  camera  [10].  In  principle, 
range  finding  via  refocusing  involves  the  construction  of  a  stack  of  refocused  images 
followed  by  a  search  for  sharp  features  within  each  image  to  isolate  depths  which 
contain  objects. 

The  sheared  projection  which  constitutes  this  refocusing  operation  bears  strong 
similarity  to  certain  computed  tomographic  techniques  employed  in  medical  imaging. 
In  that  context,  projections  of  a  density  distribution  obtained  by  radiographic  tech¬ 
niques  such  as  x-ray  scanning  are  used  to  recreate  the  original  density  distribution. 
Here,  the  density  distribution  plays  a  role  analogous  to  that  of  the  light  field,  and 
the  projections,  a  role  equivalent  to  that  of  the  refocused  images.  Often,  a  useful 
relationship  exists  between  the  distribution  and  its  projections  in  a  transformed  do¬ 
main  such  as  the  Fourier  domain.  In  the  medical  imaging  domain,  such  relationships 
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can  help  simplify  the  process  of  reconstructing  the  density  distribution,  and  subse¬ 
quently  rendering  views  of  the  object  from  different  perspectives.  In  the  domain  of 
light  held  imaging,  this  second  aspect  is  most  readily  applicable.  Ng  et  al.  show  that 
image  rendering  performed  in  this  manner  can  provide  a  significant  reduction  in  the 
computational  cost  of  refocusing  [11],  This  is  because  refocused  images  are  obtained 
in  the  Fourier  domain  by  extracting  a  2D  slice  from  the  light  held,  as  opposed  to 
the  projection  operation  required  in  the  spatial  domain.  A  Fourier  domain  approach 
to  the  depth-through-refocusing  technique,  demonstrated  in  [12],  searches  for  images 
having  high  spatial  frequencies  of  large  magnitude,  as  this  suggests  the  presence  of 
sharp  features  associated  with  in-focus  objects. 

Plenoptic  camera  range  finding  also  benefits  from  research  performed  on  light 
fields  generated  by  methods  predating  the  plenoptic  camera.  Light  fields  have  been 
traditionally  collected  using  2D  arrays  of  cameras.  A  single  camera  mounted  on  a 
gantry  allowing  for  translation  in  two  dimensions  also  allows  for  scanning  light  held 
collection.  Some  of  the  first  work  with  such  data  used  edge  detection  and  line  fitting 
to  estimate  light  held  slope,  as  in  [6].  This  technique  is  comparable  to  the  section 
concerning  the  application  of  the  SIFT  algorithm  to  light  held  ranging  within  this 
thesis  (See  Section  4.3).  More  recent  work  with  light  helds  gathered  from  track- 
mounted  cameras  estimates  local  light  held  orientation  by  finding  the  slope  along 
which  the  light  held  shows  high  consistency  (low  variance)  [7].  The  uncertainty  of 
this  approach  is  assessed  in  detail  within  this  report.  Some  of  the  most  sophisticated 
techniques  for  employing  light  helds  from  plenoptic  cameras  involve  the  combination 
a  local  slope  estimator  with  a  system  of  global  constraint  enforcement.  The  global 
optimization  framework  employed  in  [8]  employs  the  structure  tensor  to  provide  a 
local  slope  estimate.  The  structure  tensor,  derived  in  [13],  starts  with  the  principle 
that  a  region  having  a  particular  orientation  should  contain  energy  concentrated  along 
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a  line  in  the  Fourier  domain.  Orientation  detection  can  be  achieved  by  least  squares 
fitting  to  this  line,  which  is  an  operation  able  to  be  performed  entirely  within  the 
spatial  domain.  The  structure  tensor  itself  achieves  good  local  estimates  compared  to 
other  methods  like  the  gradient  method  in  [3] .  The  estimate  is  improved  by  employing 
an  optimization  framework  in  which  the  cost  of  assigning  a  given  depth  takes  into 
account  the  ordering  of  objects,  evidenced  by  occlusions,  as  well  as  certain  other 
features,  such  as  the  location  of  edges  yielded  via  edge  detection. 


III.  Imaging  Theory 


3.1  Introduction 

The  dataset  captured  by  a  plenoptic  camera  is  known  as  a  light  field  [3].  The 
light  field  is  based  on  a  geometric  optics  formulation  of  light  by  which  it  describes 
the  propagation  of  light  energy  in  a  space.  Some  of  the  earliest  work  with  light 
fields  appears  in  the  context  of  image  based  rendering.  See  [14]  and  [5]  for  detailed 
descriptions  of  the  light  field  within  that  context.  The  goal  of  this  chapter  is  to 
describe  how  the  light  field  is  captured  by  the  plenoptic  camera,  and  to  examine 
what  the  sampled  light  field  will  look  like  for  a  point  source  at  some  known  location. 
To  this  end,  different  subspaces  of  the  light  field  allowing  for  easy  visualization  of 
its  structure  will  be  discussed.  The  process  of  generating  refocused  images  from  the 
light  field  will  be  considered  within  both  the  spatial  and  frequency  domains.  Finally, 
the  traditional  plenoptic  camera  sampling  geometry  will  be  compared  with  that  of 
the  ‘focused’  plenoptic  camera,  and  a  scheme  for  creating  a  focused  plenoptic  camera 
with  equivalent  sampling  characteristics  to  that  of  a  traditional  plenoptic  camera  will 
be  presented. 


3.2  The  Light  Field  within  a  Simple  Imaging  System 


In  perhaps  its  most  basic  form,  the  light  field  is  nothing  more  than  the  radiance 
distribution  in  a  2D  plane.  The  radiance  along  a  ray  at  the  point  (s,t)  and  in  the 
direction  (0,0),  defined  according  to  Fig.  1,  is  given  by  [15] 


L(s,  t,  9,  0) 


d24> 

dQdAicosO 


(1) 
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Figure  1.  The  Light  Field  as  a  Radiance  Distribution.  The  light  field  can  be  thought 
of  as  the  radiance  distribution  along  a  2D  plane.  L(s,t,6,(j>)  gives  the  radiance  at  the 
position  (s,t)  and  in  the  direction  (0,(j>).  This  direction  can  be  conveniently  defined  in 
terms  of  a  second  point,  (u,v),  on  a  plane  parallel  to  the  original  plane.  Likewise,  the 
solid  angle,  f 2,  used  to  normalize  the  radiance,  can  be  given  in  terms  of  an  area  on  the 
second  plane,  A2,  and  the  distance  between  the  two  planes,  l. 


where  L  has  units  of  [W/cm2sr\.  The  angles  in  this  equation  can  be  defined  con¬ 
veniently  by  considering  the  intersection  of  the  ray  with  a  second  plane  positioned 
parallel  to  the  first.  The  geometry  needed  for  performing  this  parametrization  is  pic¬ 
tured  in  Fig.  1.  We  define  $(s,  t,  9,(f>)  as  the  radiant  flux  exiting  the  area  Ai  in  the 
(s,  t)  plane  into  the  solid  angle  hi,  defined  by  the  pyramid-shaped  region  in  the  figure. 

The  length,  h  of  this  region  is  also  the  hypotenuse  of  the  right  triangle  with  base 
l  and  height  o.  The  length  of  the  hypotenuse  is  determined  via  the  Pythagorean 
theorem  as 

h  =  \Jl 2  +  (s  —  u )2  +  (t  —  v)2.  (2) 


This  allows  for  the  angle,  9,  to  be  defined  implicitly  as 


sjl2  +  (s  -  u)2  +  (t  -  v)2 
cos  (9)  =  — - - - 


u )2  +  (t 
¥~ 


(3) 


At  this  point,  we  make  the  assumption  that  l  is  sufficiently  large  compared  to  the 
other  dimension  that  cos (9)  ~  1. 
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Since  the  angle  spanned  by  A2  is  small,  the  solid  angle  fl  can  be  approximated  by 


Q  «  A2cos(9)/l2  «  AuAv/l2 


(4) 


where  the  cosine  is  approximated  to  equal  one  as  discussed  above.  Using  these  defi¬ 
nitions,  along  with  A\  =  As  At,  Eq.  1  can  be  written  in  an  alternate  parametrization 


L(s,  t ,  u,  v )  = 


l2d4$(s,  t,  u,  v ) 
dsdtdudv 


We  are  interested  in  using  this  parametrization  of  the  light  held  to  describe  the 
radiance  distribution  inside  of  a  simple  camera.  We  let  the  (s,  t)  plane  represent  the 
focal  plane  or  detector  plane  of  the  camera,  and  the  (u,  v)  plane  represent  the  plane  of 
the  collecting  lens.  The  function  L(s,t,u,v)  gives  the  radiance  along  a  ray  traveling 
from  the  point  (u,  v )  on  the  main  lens  plane  to  the  point  (s,  t)  on  the  focal  plane. 
The  variables  u  and  v  are  defined  as  {(u,v)  G  3ft2  :  |w|  <  R  A  |u|  <  \/ K2  —  u2}  where 
R  =D/2  is  the  radius  of  the  main  collecting  lens.  The  variables  s  and  t  are  likewise 
defined  as  {(s,t)  G  3ft2  :  |s|  <  W8j 2  A  \t\  <  Wt/2}  where  1US  and  Wt  are  the  widths 
of  the  rectangular  collecting  area  (which  we  will  later  define  as  the  area  containing 
microlenses  in  the  case  of  a  plenoptic  camera)  in  the  s  and  t  dimensions.  The  zero  of 
each  axis  is  located  at  the  optical  axis  of  the  camera. 

We  assume  an  ideal  imaging  relationship  between  an  object  point  located  outside 
of  the  camera  a  distance  za  from  the  main  lens  plane  and  an  image  point  located 
inside  of  the  camera  a  distance  Zi  from  the  main  lens,  where  za  and  z%  are  related  by 
the  imaging  relation 

111 
- ' —  “7 

Zi  j 

where  /  is  the  focal  length  of  the  lens.  The  assumption  of  an  ideal  imaging  rela¬ 
tionship  implies  that  all  rays  coming  from  the  object  point  and  passing  through  the 
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Figure  2.  The  Light  Field  within  a  Camera.  The  light  field  can  be  used  to  describe  the 
radiance  distribution  within  an  imaging  system.  Under  the  condition  of  ideal  imaging, 
all  of  the  light  from  a  given  object  point  will  pass  through  a  point  (x,  y)  on  a  plane 
located  some  distance  Z{  from  the  main  lens. 


aperture  stop  represented  by  the  main  collecting  lens  will  also  pass  exactly  through 
the  image  point  defined  by  the  distance  Zi  and  the  coordinate  (x,y),  which  specifies 
the  transverse  location  of  the  image  relative  to  the  optical  axis.  Fig.  2  provides  a 
visualization  of  this  scenario. 

The  requirement  that  all  rays  pass  through  a  specified  image  point  for  a  given 
object  point  implies  a  mapping  between  the  uv  plane  and  the  st  plane  that  is  unique 
for  each  object  point.  This  mapping  specifies  the  region  of  the  light  held  containing 
the  radiance  from  the  point  source.  Fig.  3  provides  the  geometry  for  understanding 
this  mapping  in  two  dimensions.  In  passing  through  the  image  point,  the  ray  forms 
two  similar  triangles,  one  on  each  side  of  the  image  plane.  Their  dimensions  are 
related  by 

—  =  — •  (?) 
This  equation  is  solved  for  s  in  terms  of  u  by 


s  =  x 


Zi 


mu  +  x(l  —  m) 


(8) 
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Figure  3.  Ray  Mapping  in  an  Imaging  System.  The  condition  of  ideal  imaging,  that 
all  rays  from  a  given  object  pass  through  the  same  point  within  the  camera,  imposes 
a  mapping  between  the  main  lens  position  (u,v)  and  the  focal  plane  position  ( s,t )  The 
ray  in  the  figure  forms  two  similar  triangles,  one  anterior  to  the  image  plane  and  one 
posterior.  Equating  the  ratio  of  each  triangle’s  dimensions  yields  an  equation  that  can 
be  solved  to  establish  the  mapping  between  (s,  t)  and  (u,  v )  unique  to  a  given  image 
location. 

where  m  =  — za/zi .  Substituting  za  =  l  —  Zi  and  letting  a  =  Zi/l,  we  see  that 
m  =  1  —  1/a,  in  agreement  with  the  format  used  in  [10].  The  equation  relating  t  and 
v  can  be  derived  in  the  same  manner.  Eq.  8  is  nothing  more  than  the  equation  of  a 
line  with  slope  m  and  an  u- intercept  related  to  x.  Since  m  is  a  function  of  zt.  it  is 
implicitly  a  function  of  object  distance.  This  fact  will  become  of  much  importance  as 
we  move  forward. 

Ignoring  the  effects  of  reflection  and  absorption  by  optical  elements,  the  radiance 
within  the  conic  regions  of  Fig.  2  will  be  equal  to  the  radiance  at  the  source  object. 
We  can  use  the  mapping  defined  in  Eq.  8  to  state  this  mathematically,  as  in 

L  (mu  +  x(l  —  m),mv  +  y(  1  —  m),u,  v)  =  L0(x,  y,  m)  (9) 

where  L0(x,y,m )  is  the  radiance  at  the  object  point  associated  with  (. x,y,m ).  Iden¬ 
tifying  an  object  point  in  this  manner  is  appropriate  since  each  of  these  values  can 
be  determined  directly  from  the  location  of  the  object  in  world  space.  The  equation 
is  simplified  if  we  identify  the  object  point,  not  by  the  location  of  its  image,  (x,y), 
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but  by  the  center  of  its  circle  of  confusion  at  the  microlens  plane,  (s,  t).  This  location 
is  determined  by  setting  u  —  0  in  Eq.  8,  whereupon  we  see  that  s  =  a;(l  —  m)  and 
t  =  a;(l  —  m).  Upon  making  this  substitution,  we  obtain  the  simplified  form 

L  (s  +  mu,  t  +  mv ,  u,  v)  =  L0(s,  t,  m).  (10) 


For  a  given  object  point,  the  values  of  m,  s,  and  t  in  this  equation  will  be  fixed. 
Allowing  u  and  v  to  vary  across  their  respective  domains,  the  equation  assigns  the 
radiance  Lq  to  the  points  on  a  2D  plane  within  the  4D  space  of  the  light  held.  In  the 
2D  subspace  of  the  light  held  defined  by  fixing  t  and  v,  this  plane  will  appear  as  a  ID 
line  of  slope  ds/du  =  m  (See  Fig.  6).  More  detailed  interpretations  of  the  equation 
are  explored  in  future  sections. 


3.3  Light  Field  Sampling  with  a  Plenoptic  Camera 

Fig.  4  provides  a  comparison  of  a  conventional  camera  and  a  plenoptic  camera. 
A  detector  element  of  a  conventional  camera  captures  only  information  about  the 
irradiance  (power  per  unit  area)  at  the  focal  plane.  The  irradiance  is  determined  by 
integrating  the  radiance  function,  L ,  over  the  solid  angle  defined  by  the  main  lens. 
Using  Eq.  5,  this  integration  can  be  performed  instead  over  the  lens  area, 


E(s,t) 


1 

T2 


L(s,  t ,  u,  v)dvdu 


(11) 


where  u  and  v  are  integrated  over  their  domains,  as  defined  earlier. 

The  radiant  hux  (power)  captured  by  a  detector  will  be  equal  to  the  integrated 
irradiance  over  the  surface  of  the  detector.  We  represent  this  integration  by  convolving 
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Figure  4.  Comparison  between  Plenoptic  and  Conventional  Cameras,  (a)  shows  the 
overlapping  circles  of  confusion  for  two  out-of-focus  point  sources  in  a  traditional  cam¬ 
era.  The  insertion  of  a  microlens  array  before  the  detector  array  in  (b)  allows  the  light 
from  ‘out-of-focus’  point  sources  to  be  sampled  separately.  The  microlenses  are  placed 
one  focal  length  away  from  the  detector  plane.  This  allows  for  each  microlens  to  be 
envisioned  as  an  angularly  sensitive  pixel,  and  makes  possible  mapping  between  each 
microlens  subpixel  and  a  region  of  the  main  lens. 
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the  irradiance  distribution  with  a  detector-shaped  kernel,  hd(s,t),  given  by 


hd(s,t)  —  RECT(s/ As,t/ At)  (12) 

where  here  As  and  At  are  detector  sizes.  Sampling  is  then  achieved  via  multiplication 
with  a  comb  function,  having  a  periodicity  matched  to  the  detector  pitch,  such  that 
[16] 


1 >(«,*) 


[E{s,t)*hd{s,t)]-^COMB 


s  t  \ 
As’  ~At )  ' 


(13) 


Since  the  camera  detects  only  irradiance,  it  will  not  distinguish  between  the  overlap¬ 
ping  circles  in  Fig.  4a.  Where  the  cones  of  light  from  the  two  point  sources  overlap, 
they  both  contribute  to  the  irradiance  integrated  by  the  detector. 

The  plenoptic  camera  introduces  a  microlens  array  in  front  of  the  detector  array  of 
a  conventional  camera.  In  this  section,  we  consider  the  case  where  the  microlenses  are 
separated  from  the  detector  array  by  a  microlens  focal  length.  Since  the  microlenses 
are  small  relative  to  the  separation,  /,  between  the  microlens  plane  and  the  main 
lens,  the  situation  of  the  main  lens  approximates  ‘optical  infinity’  [10].  This  means 
that  the  detectors  behind  each  microlens  image  the  back  of  the  main  lens,  with  each 
detector  integrating  up  the  radiance  from  the  region  of  its  instantaneous  field  of  view 
(IFOV).  This  allows  for  the  light  from  the  two  point  sources  in  Fig.  4b  to  be  recorded 
separately,  since  each  point  source  inhabits  a  different  region  of  the  light  field. 

Fig.  5  shows  how  the  detector  size  A q  and  the  microlens  size  As  stop  the  region 
of  the  light  field,  L,  integrated  by  each  pixel.  The  image  of  the  detector  at  the  main 
lens  plane  gives  the  IFOV,  A u  =  A q  lm/ld-  In  order  to  obtain  a  radiant  flux  quantity 
associated  with  each  detector,  Eq.  5  must  be  integrated  over  the  regions  defined  by 
the  microlens  in  front  of  the  detector  as  well  as  the  detector  IFOV  at  the  main  lens 
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Figure  5.  Plenoptic  Camera  Radiometry.  Each  microlens  performs  stopping  of  the 
radiance  reaching  the  detectors  beneath  it.  The  radiance  is  also  stopped  at  the  at 
detector  plane  by  the  extent  of  the  detector  itself.  Projecting  the  detector  dimensions 
forward  onto  the  main  lens  allows  us  to  picture  this  stopping  as  occurring  at  the  main 
lens  plane  according  to  the  dector  IFOV,  Au.  More  precisely,  the  dimensions  of  each 
microlens  and  the  detector  IFOV  provide  the  limits  of  integration  needed  for  obtaining 
a  radiant  flux  by  integrating  Eq.  5 

plane.  Analogously  to  the  conventional  camera,  we  represent  this  integration  as  a 
convolution  with  a  4D  kernel  whose  dimensions  are  determined  by  the  size  of  these 
regions,  given  by 


h(s,t,u,v )  =  RECT(s/ As,  t/At,  u/Au,  v/Av). 


(14) 


Once  again,  sampling  is  achieved  via  multiplication  with  the  appropriate  comb  func¬ 
tion, 


S(s,  t ,  u,  v ) 


1 

AsAtAuAv 


COMB 


{  S  t  U  V  \ 

\As’  At’  Am’  ~Av )  ’ 


(15) 


such  that 


&(s,t,u,v)  =  -j^[L(s,t,u,v)  *  h(s,t,u,v)]S(s,t,u,v). 


(16) 


Note  that,  although  it  is  likely  that  microlenses  will  be  circular  in  shape  and  arranged 
in  a  non-rectangular  grid,  our  sampling  equations  assume  rectangular  lenses  and  a 
rectangular  grid  for  the  sake  of  simplicity.  Even  where  a  close-packed  hexagonal  grid 
of  circular  microlenses  is  used  to  maximize  fill-factor,  resampling  of  the  light  field 
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a) 


Figure  6.  Plenoptic  Camera  Sampling:  How  the  plenoptic  camera  samples  the  sloped 
line  associated  with  a  point  source.  The  process  of  convolution  followed  by  sampling 
with  the  comb  function  in  Eq.  16  collects  all  the  energy  associated  with  the  portion  of 
the  line  within  the  rectangular  region  and  attributes  it  to  the  discrete  sample  at  the 
center  of  the  region.  Each  row  of  the  figure  represents  a  row  of  pixels  within  a  different 
subaperture  image.  Each  column  is  a  row  of  pixels  within  a  microlens  image. 


after  data  collection  can  be  used  to  give  effective  sampling  characteristics  similar  to 
those  indicated  here  [17]. 

Fig.  6  illustrates  how  the  plenoptic  camera  samples  the  sloped  line  associated 
with  a  point  source,  as  discussed  in  the  previous  section.  The  process  of  convolution 
followed  by  sampling  with  the  comb  function  in  Eq.  16  collects  all  the  energy  associ¬ 
ated  with  the  portion  of  the  line  within  the  rectangular  region  and  attributes  it  to  the 
discrete  sample.  The  slope,  m,  in  s  samples  per  u  sample  is  related  to  the  continuous 
slope  m  by  the  relation 


where  7  =  Au/As.  Note  that  there  is  a  critical  point  at  \m\  >  1,  where  the  point 
source  illuminates  multiple  microlenses  per  subaperture  (multiple  pixels  per  subaper¬ 
ture  image,  represented  by  a  row  within  the  figure). 


=  m'y 


(17) 


ds/As  A  u 

m  =  -  =  m—— 

du/Au  As 
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Discretized  Light  Field  Representation. 


We  find  that  when  working  with  actual  sampled  light  fields,  it  is  very  convenient 
to  work  with  a  normalized  form  of  the  light  field  dimensions.  To  this  end,  we  define 
the  function 

K(s,t,u,v )  =  &(sAs,tAt,uAu,vAv)  (18) 

where  the  variable  ranges  are  defined  in  Table  1.  Ns  and  Nt  give  the  number  of 
microlenses  in  each  respective  dimension.  Nu  and  Nv  give  the  number  of  aperture 
regions  in  each  dimension,  where  this  number  is  related  to  the  number  of  pixels 
beneath  each  microlens  by  Nu  =  As/Aq. 

Table  1.  LF  Dimension  Intervals 


Variable 

is  a  member  of  the  set 

Variable 

is  a  member  of  the  set 

s 

[-Wa/2,Wa/2] 

s 

l-(N,  -  l)/2,  ( N ,  -  l)/2] 

t 

[~Wt/2,Wt/2] 

i 

[-(JV,  -  l)/2,  (Nt  -  l)/2] 

U,V 

{( u ,  v)  e  !ft2  :  \u \  <  R  A  n  <  \J Ft?  —  u2} 

u 

[~(NU  -  l)/2,  (JV„  -  l)/2] 

V 

(-(JV„  -  l)/2,  (JV,  -  l)/2] 

We  allow  the  normalized  variables  to  take  on  real  values.  However,  where  non¬ 
integer  values  are  used,  it  is  implied  that  some  form  of  interpolation  must  be  employed 
in  order  to  provide  the  function  value.  In  most  cases  within  this  document,  interpo¬ 
lation  will  be  discussed  explicitly.  When  a  normalized  variable  is  used  as  the  index 
of  a  summation,  the  variable  is  assumed  to  span  the  set  of  integers  contained  in  the 
interval  defined  by  Table  1. 

Eq.  8,  which  specifies  the  region  of  the  radiance  distribution  populated  with  the 
radiance  from  a  single  point  source,  must  be  modified  for  use  with  these  normalized 
coordinates.  The  modified  equation  is  given  by 

K  ( s  +  mu,  t  +  fhv,  u,  v)  =  K0(s,  t,  in).  (19) 
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3.4  Light  Field  Subspaces  and  Image  Formation 


It  is  useful  to  gain  a  sense  of  the  information  contained  in  a  number  of  2D  sub¬ 
spaces  of  the  light  held  formed  by  fixing  two  of  its  parameters.  In  its  two-plane 
parametrization  the  light  held  contains  a  certain  degree  of  symmetry,  as  it  can  be 
used  to  describe  the  radiance  at  either  of  the  two  planes  used  in  the  parametrization. 
Nonetheless,  the  structure  of  the  information  contained  in  the  light  held  is  very  much 
asymmetric  when  the  light  held  is  used  to  describe  the  radiance  distribution  within 
an  imaging  system. 

The  coordinates  u  and  v  of  the  sampled  light  held  specify  the  location  of  the 
subaperture  of  size  AuAv  which  crops  the  radiance  integrated  to  give  the  radiant 
hux  &(s,t,u,v)  collected  by  a  detector.  Likewise,  the  coordinates  s  and  t  specify  the 
location  of  a  microlens  of  size  As  At  which  crops  the  radiance  at  the  microlens  plane. 
Comparing  Eq.  16  and  13  shows  that  this  results  in  spatial  sampling  in  the  microlens 
plane  that  depends  on  microlens  pitch  in  the  same  way  that  focal  plane  sampling 
depends  on  pixel  pitch  in  a  conventional  camera. 

From  these  considerations,  we  expect  that  by  fixing  u  and  v  to  some  value,  we 
will  obtain  an  image  very  comparable  to  that  taken  with  a  conventional  camera  of 
pixel  size  As  At,  but  with  a  lens  that  is  masked  to  only  allow  light  through  the  region 
indicated  by  the  coordinate  (u,v).  This  2D  light  held  slice  is  what  is  known  for  this 
reason  as  a  ‘subaperture  image’  [4],  As  illustrated  in  Fig.  7,  the  subaperture  image 
will  have  reduced  blur  for  defocused  objects  due  to  the  higher  f-number  associated 
with  the  smaller  aperture  size.  The  figure  also  illustrates  how  the  mapping  identified 
in  Eq.  19  results  in  a  w-dependent  shift  in  the  image  location. 

Fig.  8  shows  the  3D  subspace  of  the  light  held  obtained  by  fixing  v.  The  rep¬ 
resentation  of  the  subspace  is  built  by  vertically  stacking  all  the  subaperture  images 
having  the  same  value  of  v.  As  expected,  the  top  surface  of  the  structure  has  the 
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Figure  7.  Subaperture  Image  Formation.  The  top  row  of  the  Figure  illustrates  the 
image  collected  by  a  conventional  camera  for  two  defocused  point  sources.  A  subaper¬ 
ture  image  (bottom  row)  is  obtained  from  the  light  held  (middle  row)  by  fixing  u  and  v 
(i.e.  by  taking  all  of  the  pixels  ‘looking’  at  a  particular  subaperture).  The  subaperture 
image  is  sharper  than  the  conventional  image  due  to  its  higher  f-number.  Also,  the 
location  of  the  point  sources  within  the  image  is  shifted  due  to  the  mapping  of  Eq.  19. 
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Figure  8.  Light  Field  Slices,  (a)  shows  a  3D  subspace  of  the  light  field  obtained  by 
fixing  v.  The  representation  shown  here  is  formed  by  stacking  a  row  of  subaperture 
images  into  a  3D  cube.  The  slice  of  this  cube  obtained  by  fixing  the  spatial  coordinate, 
t,  is  known  as  an  epipolar  plane  image  (EPI),  and  is  shown  in  (b).  In  the  4D  light 
field,  a  point  source  in  object  space  is  represented  by  a  2D  plane.  In  an  EPI,  this  plane 
appears  as  a  line  with  slope  fh. 


characteristics  of  an  image  taken  by  a  conventional  camera.  The  front  face  of  the 
structure  is  the  2D  plane  formed  by  fixing  t  and  v.  This  subspace  is  known  as  an 
Epipolar  plane  image  or  EPI  [6].  The  sloped  lines  visible  in  this  image  result  from 
difference  in  the  apparent  location  of  objects  when  viewed  through  different  apertures 
of  the  camera  under  the  parallax  effect.  The  slope  of  each  line,  given  by  Eq.  17,  is 
related  to  the  distance  from  the  camera  to  the  point  responsible  for  the  line. 
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Image  Formation. 


By  integrating  Eq.  16  over  u  and  v,  we  are  able  to  recover  the  image  produced 
by  the  conventional  camera  in  Eq.  13.  This  is  not  surprising,  as  the  summation  of 
subaperture  images  is  nothing  more  than  the  recombination  of  the  information  shown 
to  be  separately  captured  in  Fig.  4.  The  image  is  given  by 

img(s,  t)  =  u,  v)  (20) 

U  V 

where  the  summations  are  over  all  integer  values  within  the  intervals  defined  for  u 
and  v. 

The  summation  over  u  and  v  will  result  in  the  2D  lines  in  Fig.  6  and  Fig.  8b  being 
projected  down  into  one  dimension.  It  is  evident  that,  if  the  line  is  initially  vertical, 
its  projection  will  fill  only  one  pixel  in  the  the  final  image,  img(s,  t).  Conversely,  if  the 
line  is  sloped  such  that  it  spans  multiple  s  samples,  it  will  impact  multiple  pixels  of 
the  projection  img(s,t).  This  spreading  out  of  sloped  lines  is  nothing  more  than  the 
circle  of  confusion  associated  with  out-of-focus  points  in  a  conventional  photograph. 

When  an  image  is  formed  via  projection,  vertical  lines  (m  =  0)  appear  as  in-focus 
points,  while  increasing  slope  leads  to  an  increasing  degree  of  defocus  in  the  generated 
image.  This  suggests  that  some  degree  of  refocusing  may  be  performed  by  ‘shearing’ 
the  light  held  by  some  amount,  Am,  prior  to  projecting,  such  that  points  originally 
having  slope  —Am  will  appear  to  be  in  focus. 

Though  we  arrive  at  this  result  from  an  intuitive  consideration  of  the  light  held 
structure,  it  is  possible  to  achieve  the  same  result  more  formally  by  looking  at  the 
re-parametrization  of  the  light  held  necessary  to  simulate  a  conventional  camera  with 
varying  focal  length,  as  in  [10]. 
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In  order  to  represent  the  shearing  operator,  we  find  it  convenient  to  define  the 
vectors  x  =  [s,  t,  u,  v]T,  s  =  [s,  f]T,  and  u  =  [ u,v]T .  We  then  allow  each  of  our 
functions  to  accept  these  vectors  as  arguments,  as  in  /(x)  =  f(s,t,u,v).  Under  this 
convention,  the  shearing  operator  can  be  defined  by  a  matrix  multiplication  of  the 
argument  vector,  as  in 

B[/(x)](x)=/(iT1x),  (21) 

where  the  matrix  needed  to  shear  the  light  held  by  the  slope  fh  is  given  by  [11] 

1  0  —  fh  0  1  0  fh  0 

0  1  0  —  fh  ,  0  10m 

=  •  (22) 

0  0  1  0  0  0  1  0 

0  0  0  1  0  0  0  1 

To  conhrm  that  this  operator  has  the  desired  effect,  we  write  Eq.  19  in  terms  of  the 
new  vector  coordinates  introduced  here. 

K([s  +  mu,  t  +  fhv,  u ,  v]T )  =  K0(s,  t,  fh).  (23) 

We  wish  to  shear  the  light  held  such  that  the  object  identified  by  (s,  t,  fh)  is  repre¬ 
sented  within  the  light  held  by  a  vertical  line.  To  do  this  we  apply  the  operator  B-r-n. 
Under  the  effects  of  this  transformation,  the  value  K0(s,i,fh)  is  remapped  to  a  new 
region  of  the  light  held: 

K0 (s,  t,  fh)  =  B-rh [K (x)]  (x)  =  K (^Z^x) .  (24) 
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The  matrix  product  £>_,f  x  evaluates  as 

1  0  —  m  0 

0  1  0  —  m 

0  0  1  0 

0  0  0  1 

such  that  the  final  result  of  the  shearing  operation  is  the  updated  mapping, 

A'([M,  U, a)]r)  =  K0(s,i,m).  (26) 


s  +  mu 

s 

t  +  mu 

i 

u 

u 

V 

V 

(25) 


This  equation  confirms  that,  under  the  impact  of  the  shearing  operator,  B-r-n,  a  point 
source  which  originally  mapped  to  a  region  of  the  light  field  with  slope  m,  now  maps 
to  a  region  having  zero  slope. 

The  use  of  operators  in  this  section  was  based  on  a  similar  use  of  operators  in 
[11],  As  we  will  rely  on  operator  notation  in  the  next  section,  it  will  continue  to  be 
useful  to  adopt  conventions  and  operator  definitions  similar  to  those  employed  in  that 
paper. 

The  projection  used  earlier  to  form  a  conventional  image  from  the  light  field  is 
given  its  own  operator,  defined  as 

^[/(x)](s)  =  Y^Y^f(s,i,u,v).  (27) 

U  V 

The  composition  of  the  shearing  and  projecting  operators  can  be  defined  as  an  imaging 
operator,  since  it  results  in  the  generation  of  a  refocused  image, 


where  o  indicates  functional  composition. 


Sampling  Effects. 

Now  that  we  have  identified  a  procedure  for  generating  refocused  images  from 
the  light  held,  it  is  appropriate  to  examine  the  features  of  images  rendered  by  this 
procedure.  The  representation  of  the  plenoptic  camera  light  held  sampling  in  Fig.  6 
illustrates  that,  where  \m\  <  1,  the  radiance  from  a  single  point  source  is  localized 
to  one  pixel  per  subaperture  image.  However,  where  this  condition  is  not  met,  the 
number  of  pixels  per  subaperture  image  increases  to  the  order  of  n  =  round(m2) 
(Fig.  6  shows  a  number  of  pixels  per  row  of  n  =  round(m),  but  each  row  is  only 
one  dimension  of  a  subaperture  image).  This  is  an  important  observation  because  it 
indicates  that  there  is  a  limit  to  the  plenoptic  camera’s  ability  to  produce  refocused 
imagery.  Even  in  the  absence  of  an  optical  point  spread  function  due  to  aberrations 
or  diffraction,  it  is  not  always  possible  to  generate  from  the  light  held  imagery  in 
which  the  circle  of  confusion  clue  to  defocus  is  contained  within  a  single  image  pixel. 

Fig.  9  supports  this  result  by  means  of  a  plenoptic  camera  simulated  via  geometric 
raytracing.  The  hrst  row  of  the  figure  shows  the  distribution  of  irradiance  at  the 
detector  plane  of  the  camera  at  intervals  as  a  single  point  source  is  repositioned 
successively  further  from  the  camera.  The  second  row  shows  lines  of  diminishing 
slope  as  the  object  becomes  more  distant,  as  would  be  seen  in  an  EPI  slice  of  the 
light  held.  The  slopes,  fh.  in  this  case  are  much  smaller  than  unity,  so  only  one  pixel 
is  illuminated  per  row  for  each  line.  The  third  row  shows  the  effect  of  increasing  the 
detector  size  of  the  camera,  A q.  Since  A u  =  Aqlm/ld,  this  leads  to  an  increase  in 
A u  which  in  turn  leads  to  an  increase  in  7  =  Au/As  =  m/m  as  well  as  fh.  This 
brings  some  of  the  slopes  on  the  left  side  of  the  image  near  to  the  threshold  value  of 
unity,  and  these  lines  begin  to  shown  a  certain  amount  of  spreading.  The  spreading  in 
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Figure  9.  Point  Source  Moving  Away  from  Camera,  (a)  shows  simulated  raw  detector 
data  as  a  point  source  moves  away  from  the  camera  at  intervals,  (b)  shows  2D  slices 
of  the  light  fields  obtained  from  the  sensor  data  in  (a),  but  at  smaller  intervals,  (c) 
illustrates  the  spread  of  the  sampled  light  field  as  subpixel  size  increases. 
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the  figure  is  actually  somewhat  greater  than  would  be  expected  for  an  ideal  imaging 
system,  and  this  is  because  of  spherical  aberration  induced  by  the  main  camera  optic. 


3.5  Diffraction  Effects 

Analysis  of  light  field  sampling  by  a  plcnoptic  camera  up  to  this  point  has  as¬ 
sumed  a  geometric  optics  framework  free  of  aberrations  or  effects  of  diffraction.  This 
approach  is  helpful  for  setting  the  stage  for  the  plenoptic  camera,  but  the  effects  of 
these  factors  can  play  a  role  in  the  camera’s  design  and  performance.  This  section 
will  provide  a  framework  for  considering  the  effects  of  diffraction. 

Diffraction  is  a  phenomenon  with  origins  outside  of  the  ray  model  of  light.  The 
discussion  here  follows  a  far  more  detailed  treatment  of  the  subject  in  [18].  Early 
speculation  concerning  the  nature  of  light  propagation  proposed  that  each  point  on 
an  expanding  wavefront  would  expand  into  a  secondary  spherical  wavefront,  such 
that  the  envelope  of  these  secondary  wavefronts  constituted  the  new  wavefront.  This 
concept  is  known  as  the  Huygens  principle.  By  allowing  these  secondary  wavefronts  to 
interfere  with  each  other,  Fresnel  was  able  to  account  for  many  observed  diffraction 
effects.  It  was  not  until  later  on  that  this  approach  was  to  some  degree  grounded 
mathematically  in  Maxwell’s  equations,  first  by  Gustav  Kirchoff.  Modifications  of 
his  original  assumptions  led  to  the  Rayleigh- Sommerficld  formula,  as  commonly  used 
today  [18], 

l >(s,t)  =  1  [f  UM^i^cosOdudv.  (29) 

J  J  r oi 

where  r0i  is  defined  in  Fig.  10.  This  equation  gives  mathematical  expression  to  the 
notion  of  secondary  spherical  wavefronts  expanding  from  an  initial  wavefront.  The 
scalar  field  U  (a  non-physical  quantity  introduced  in  place  of  the  vector  field  to  make 


Figure  10.  Geometry  for  Diffraction  Analysis:  Geometry  to  be  used  in  discussing 
diffraction  from  an  aperture  or  lens.  P2  is  a  point  on  the  object,  P1  is  a  point  on  the 
aperture,  and  P0  is  point  on  the  observation  plane.  In  an  imaging  system,  a  lens  is 
placed  at  the  aperture  plane. 


the  problem  tractable)  at  the  plane  (s,  t)  is  given  by  the  superposition  of  spherical 
waves  originating  from  each  point  on  (u,v). 

In  the  case  of  light  passing  through  an  aperture,  various  approximations  to  this 
equation  can  be  made  depending  on  the  scale  of  the  aperture  compared  to  the  distance 
to  the  second  plane.  These  approximations  primarily  involve  the  number  of  terms 
retained  in  the  Taylor  series  expansion  of  roi-  For  example,  in  the  Fraunhofer  region, 
such  eliminations  lead  to  the  form  [18] 


U{s,t) 


A 

iXz 


U  ( u ,  v)  exp 


(su  +  tv) 


dudv , 


(30) 


where  A  is  a  phase  factor  dependent  on  z.  It  is  worth  noting  that  the  held  at  (s,  t) 
is  simply  the  scaled  Fourier  transform  of  the  held  at  (u,v).  The  distances  needed  for 
this  approximation  are  very  large  at  optical  wavelengths. 

The  Fraunhofer  approximation  is  noted  here  because  a  similar  result  is  obtained 
for  the  impulse  response  of  a  simple  imaging  system,  i.e.  the  held  found  at  the  image 
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plane  for  a  point  source  at  the  object  plane.  Such  a  system  is  modeled  by  starting 
with  a  spherical  wave  to  represent  the  point  source.  The  effect  of  a  thin  lens  can 
be  represented  as  a  phase  transformation  which  converts  this  diverging  wavefront 
into  a  spherical  wave  centered  on  the  image  point.  Eq.  29  is  then  used  to  model 
the  propagation  of  this  held  at  the  lens  aperture  to  the  image  plane.  Upon  making 
certain  substitutions  described  in  [18],  it  is  seen  that  the  impulse  response  is  of  the 
form  [18] 


h(s,  t ) 


A 

Xzi 


P(u,  v )  exp 


(su  +  tv) 


dudv 


(31) 


This  impulse  response  can  be  used  as  a  Point  Spread  Function  (PSF)  to  relate  the 
diffraction-limited  image  to  the  ideal  image  predicted  by  geometric  optics  only  for 
coherent,  monochromatic  illumination.  Only  under  this  condition  do  held  strengths 
add  linearly  in  order  to  make  this  approach  valid.  For  this  case,  the  diffraction-limited 
and  geometric  images  are  related  by  [18] 


Ui(s,t)  =  JJ  h(s  -  -r))Ug(£,  r))d£,dr]  =  h(s,t)  *Ug(s,t).  (32) 

For  an  incoherent  imaging  system,  the  property  of  linearity  is  observed  by  intensity 
rather  than  held  strength.  Therefore,  it  is  necessary  to  employ  a  PSF  which  operates 
on  intensity  images,  which  is  the  square  of  the  held  impulse  response  [18] 


II  \h(s-£,t-rj)\2Ig(£,rj)d£drj=\h(s,t)\2*Ig(s,t).  (33) 

For  the  case  of  a  circular  aperture,  the  PSF  is  given  by  the  airy  disc  pattern  [19] 


\h(s,t)  |2 


4J? 


ny/s2  +  t2\ 
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TT  Vs2  +  t2\ 

A//#  ) 


(34) 
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where  J\  is  the  first  order  Bessel  function.  Associated  with  the  point  spread  function 
is  its  Fourier  transform,  known  as  the  Optical  Transfer  Function  (OTF).  By  the 
convolution  theorem,  the  OTF  operates  by  multiplying  the  spectrum  of  the  ideal 
geometric  image  to  give  the  spectrum  of  the  diffraction  limited  image.  For  a  circular 
aperture,  the  OTF  is  given  by  [19] 


OTF1D(k,k0 ) 


0  :  k  >  ko 


(35) 


where  ko  =  1  / (A //#)  is  the  cutoff  frequency.  Because  the  OTF  for  a  circular  aperture 
falls  to  zero  for  frequencies  beyond  this  cutoff,  these  frequencies  will  not  appear  within 
the  final  image. 

Within  a  conventional  digital  camera,  high  spatial  frequencies  are  also  filtered  as  a 
result  of  sampling  of  the  image  by  discrete  detector  elements  in  the  focal  plane  array. 
According  to  the  Nyquist  sampling  theory,  the  samples  of  a  signal  spaced  at  p  are 
sufficient  for  exactly  reproducing  a  signal  composed  of  frequencies  lower  than  1/2 p 
[18].  If  the  original  signal  has  frequencies  higher  than  this  cutoff,  those  frequencies 
will  be  ‘folded’  into  lower  ones  as  aliasing. 

It  is  common  to  match  the  OTF  cutoff  frequency  to  the  sampling  cutoff  frequency 
in  order  to  avoid  aliasing  as  well  as  oversampling  [20].  Oversampling  increases  noise 
without,  in  most  cases,  providing  an  improvement  in  resolution.  For  a  conventional 
camera,  the  matching  condition  is  given  by 


kNYQ  ~2 FW  (36) 

For  a  central  wavelength,  this  equation  defines  a  relationship  between  the  lens  diam¬ 
eter,  focal  length,  and  pixel  pitch.  Given  the  additional  layer  of  microlenses  within  a 
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plcnoptic  camera,  it  is  not  surprising  that  the  role  of  diffraction  in  general,  and  the 
interplay  between  sampling  and  MTF  cutoffs  in  particular,  is  more  complicated  than 
for  a  conventional  camera.  Within  the  context  of  the  plenoptic  camera,  we  would  like 
to  match  the  MTF  cutoff  of  the  main  lens  to  the  cutoff  of  the  microlens  sampling, 
and  the  MTF  cutoff  of  each  microlens  to  the  sampling  cutoff  of  the  underlying  pixels. 

As  a  first  order  approach  to  this  problem,  we  assume  that  the  two  diffraction 
effects  are  decoupled,  i.e.,  that  spreading  at  the  microlens  plane  does  not  impact 
spreading  at  the  detector  plane.  Under  this  approximation,  the  effects  of  diffraction 
can  are  modeled  via  a  4D  point  spread  function,  given  by  the  convolution  of  the  2D 
PSF  associated  with  the  main  lens  with  the  2D  PSF  associated  with  the  microlenses. 
Since  the  two  PSFs  are  functions  of  independent  variables,  this  results  in 
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The  4D  optical  transfer  function  is  likewise  given  by 


OTFiD(k„  k„  ku,  kv)  =  OTFw(jk J+¥u,  k°,)OTF1D( +  kl,  k 


(38) 


where  ID  OTF  is  as  defined  in  Eq.  35,  and  the  cutoff  frequencies  are  defined  as 
k°st  =  D/lm\  and  k®v  =  As /l A,  according  to  the  general  definition  k0  =  l/( A //#)• 
Ideally,  these  cutoff  frequencies  should  be  matched  to  the  Nyquist  cutoff  frequencies 
associated  sampling  rates  implied  in  Eq.  15,  as  in  the  following: 

7.0  _  hNYQ  ,0  _  h,NYQ  /oq3 
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To  allow  for  the  possibility  that  these  two  constraints  may  not  be  simultaneously 
achievable,  we  introduce  coefficients  ci  and  C2, 


k°,  =  C[*:"ro,  =  c2C'°, 


(40) 


were  C\  >  1  implies  undersampling  at  the  microlens  plane,  which  may  lead  to  aliasing, 
and  C\  <  1  implies  oversampling.  The  same  holds  trne  for  C2  with  regard  to  the 
detector  plane.  Evaluating  for  the  various  cutoff  frequencies,  we  get 

D  ci  As  C2 

lm  A  2  As’  IdX  2A  q 

Dividing  the  two  equations  gives,  after  canceling  like  terms  and  rearranging, 


ci  _  D_  ld_  _  D_ 
c2  A  q  lm  A  u 


(42) 


which  follows  because  A u  =  A qlm/ld  and  D  =  NuAu.  This  result  is  very  interesting 
because  it  means  that,  for  a  plcnoptic  camera,  since  Nu  >  1,  it  is  impossible  for  both 
Ci  and  C2  to  equal  unity,  and  thus  it  is  impossible  to  simultaneously  match  the  OTF 
cutoff  of  the  main  lens  to  the  cntoff  of  the  microlens  sampling,  and  the  OTF  cntoff  of 
each  microlens  to  the  sampling  cntoff  of  the  underlying  pixels.  Rather,  it  is  necessary 
that  there  be  some  amount  of  nndersampling  at  the  microlens  plane,  oversampling  at 
the  detector  plane,  or  both. 

Fig.  11  graphically  illustrates  the  problem  for  a  plcnoptic  camera  having  Nu  =  3 
snbapertnres.  At  Ci  =  1,  the  main  lens  OTF  cntoff  is  perfectly  matched  to  the 
sampling  cntoff  (Nyqnist)  of  the  microlenses.  However,  the  microlens  OTF  cntoff 
is  short  of  the  Nyqnist  rate  for  the  detector  array,  indicating  oversampling  at  the 
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(b) 

Figure  11.  Plenoptic  Camera  OTF:  Relative  Scale.  The  figure  depicts  how  the  main 
lens  OTF  and  microlens  OTF  changes  with  respect  to  the  Nyquist  frequency  as  the 
parameter  C\  is  altered  for  the  a  camera  having  Nu  =  3  subapertures. 
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detector  plane.  At  C\  =  3,  the  microlens  OTF  is  matched  to  the  detector  sampling 
cutoff.  However,  for  this  case,  there  is  undersampling  at  the  microlens  plane. 

Fig.  11  is  not  helpful  for  assessing  what  is  the  optimal  value  of  c\  and  since 
it  masks  the  fact  that,  in  absolute  terms,  changing  these  parameters  will  impact  the 
value  of  the  Nyquist  frequency  for  either  the  main  lens  or  microlens,  depending  on 
how  the  change  is  effected.  In  order  to  examine  actual  performance,  it  is  necessary 
to  introduce  the  concepts  of  ground  sampled  distance  (GSD)  and  ground  spot  size 
(GSS)  [20].  These  concepts  reflect  the  fact  that  it  is  not  spatial  frequencies  resolvable 
at  the  image  that  are  important,  per  se,  but  rather  the  spatial  frequencies  at  the 
object.  In  other  words,  these  concepts  account  for  the  magnification  of  the  imaging 
system. 

In  general,  the  GSD  is  defined  as 


GSD  =  pj 


(43) 


where  p  is  the  size  of  the  pixel  or  whatever  is  performing  the  sampling,  h  is  the 
distance  to  the  object,  and  /  is  the  focal  length.  Corresponding  to  the  GSD  is  a 
ground  sampled  Nyquist  frequency,  k^,  defined  as 
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This  relationship  also  applies  for  the  ground  sampled  OTF  cutoff  frequency, 
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For  the  plenoptic  camera,  cutoff  frequencies  at  the  target  relate  to  cutoffs  at  the 
microlens  plane  resulting  from  microlens  sampling  and  the  main  lens  OTF.  Matching 
cutoff  frequencies  gives 
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G 
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c\kN, 


D_  =  lm 
A  h  12hAs 


(46) 


In  the  same  way,  cutoff  frequencies  at  the  detector  plane  brought  about  by  detector 
sampling  and  the  microlens  OTF  are  related  to  spatial  frequencies  at  the  main  lens 
plane.  The  L  superscript  is  introduced  in  order  to  refer  to  this  case: 


k%  =  c2k^,  or 


As 

A  lm. 


c2 

2A  u 


(47) 


We  now  wish  to  consider  the  case  of  a  camera  with  fixed  lens  diameter.  For  this 
case,  as  illustrated  in  the  equations  that  follow,  the  optical  cutoff  at  the  ground  plane 
and  the  sampling  cutoff  at  the  main  lens  plane  are  fixed.  Changing  c\  effects  a  change 
in  the  sampling  cutoff  at  the  ground  plane  and  the  optical  cutoff  at  the  main  lens 
plane,  and  is  achieved  by  altering  the  ratio  of  lm  to  As. 


kG  ~  D 
~  A  h 

(48) 

kL  _  Nu 

kN~  2D 

(49) 

jG  ID 

N  2hAs  Ci  A h 

(50) 

kL  _  As  _  Cl 

0  A  lm  2D 

(51) 

Fig.  12  shows  how  the  various  cutoffs  vary  in  terms  of  the  ground  sampled  fre¬ 
quency  for  a  plenoptic  camera  with  fixed  lens  diameter  and  Nu  =  3.  The  figure 
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(b) 

Figure  12.  Plenoptic  Camera  OTF:  Absolute  Scale.  The  figure  depicts  how  the  Nyquist 
frequency  and  OTF  change  with  the  parameter  ci  for  a  camera  with  Nu  =  3  subapertures 
and  a  fixed  lens  diameter  D. 
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confirms  that  it  is  impossible  to  improve  frequency  in  one  domain  without  reducing 
it  in  the  other. 

Finally,  Fig.  13  extends  the  picture  for  varying  subaperture  numbers  (Nu).  The 
figure  illustrates  that  it  is  possible  to  achieve  high  angular  sampling  by  increasing 
the  number  of  subapertures  and  setting  c\  equal  to  the  number  of  subapertures. 
However,  as  Cj  increases,  the  Nyquist  frequency  of  the  microlens  sampling  drops  far 
below  that  of  the  main  lens  OTF  cutoff.  Because  of  these  opposing  behaviors,  there 
is  no  preferred  value  of  C\  that  gives  overall  best  performance. 

In  future  sections,  it  will  be  convenient  to  ignore  the  effects  of  diffraction,  and 
to  assume  that  sampling  by  the  detector  elements  and  microlenses  constitutes  the 
limiting  factor  impacting  the  precision  of  the  camera’s  ranging  performance.  The 
preceding  figures  illustrate  that  as  long  as  C\  >  Nu  or  equivalently  c2  >  1  this  approach 
is  warranted,  since  where  this  is  true,  all  Nyquist  frequencies  fall  below  OTF  cutoff 
frequencies.  Requiring  that  c2  >  1  imposes  a  constraint  on  the  relationship  between 
the  plenoptic  camera  //#  and  the  detector  size.  Namely,  for  the  condition  to  be  met, 
it  must  be  true  that 

A  9  >(//#)!  =  (52) 

Fig.  14  provides  a  nomograph  relating  main  lens  diameter,  focal  length  (lm),  and 
wavelength  (A)  to  the  minimum  pixel  size  satisfying  Eq.  52.  For  optical  wavelengths 
at  f'/#’s  of  interest,  the  mininum  pixels  sizes  are  small  enough  so  as  not  be  be  a 
concern.  This  analysis  does  not  deal  with  optical  aberrations,  whose  impact  is  likely 
more  critical  in  an  optical  system.  However,  it  is  worth  noting  that  optical  aberrations 
do  not  effect  the  location  of  the  cutoff  frequency  of  the  OTF  [18]. 
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Figure  13.  Plenoptic  Camera  OTF  and  Sampling  Cutoff  Frequencies.  An  illustration  of 
how  the  optical  and  sampling  cutoffs  depend  on  C\  for  different  numbers  of  subapertures 
Nu.  If  ci  >  Njj ,  oversampling  will  be  avoided  and  the  effects  of  diffraction  can  be  safely 
ignored. 
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(a)  (b) 

Figure  14.  Plenoptic  Camera  Minimum  Detector  Size.  Detectors  smaller  than  the  scale 
indicated  by  this  nomograph  will  result  in  oversampling  of  the  optical  point  spread 
function. 
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3.6  The  Fourier  Transformed  Light  Field 


As  noted  previously,  focused  images  of  the  type  generated  by  a  conventional  cam¬ 
era  can  be  obtained  by  projecting  the  light  field  down  to  two  spatial  dimensions  by 
integrating  over  the  angular  dimensions,  u  and  v  (See  Eq.  20).  This  projection  may 
be  preceded  by  a  shearing  step  to  control  the  depth  at  which  objects  appear  in  focus. 
This  relationship  bears  a  strong  similarity  to  certain  forms  of  medical  imaging,  in 
which  x-ray  attenuation  provides  a  projection  of  the  density  distribution  of  a  bone 
or  tissue.  Computed  tomography  is  the  process  of  using  projections  along  different 
directions  to  reconstruct  the  original  3D  density  distribution.  A  common  approach  to 
this  problem  involves  utilizing  useful  relationships  between  the  density  distribution 
and  its  rotated  projections  within  various  transformed  domains. 

The  projection  slice  theorem  defines  this  relationship  for  the  Fourier  domain.  In 
its  most  basic  2D  form,  the  theorem  states  that  the  sequence  of  projecting  a  2D 
function  along  a  line  and  then  taking  the  ID  Fourier  transform  along  that  line  is 
equivalent  to  the  sequence  of  taking  the  2D  Fourier  transform  of  the  function  and 
then  extracting  the  ID  slice  along  the  same  line  (see  Fig.  15).  An  intuitive  basis  for 
the  theorem  is  explained  by  Malzbender  in  [21]: 

Any  point  in  the  frequency  domain  corresponds  to  a  sinusoid  with  some 
amplitude,  phase, and  orientation.  If  the  sinusoid  is  not  aligned  with  the 
projection  direction,  its  projection  will  sum  to  zero.  However,  those  com¬ 
ponents  aligned  with  the  projection  direction  sum  to  some  finite  value. 

This  set  of  components  with  nonzero  projections  can  be  found  in  the  fre¬ 
quency  domain  along  a  line  perpendicular  to  the  projection  direction. 

Ng  et  al.  in  [11]  were  the  first  to  demonstrate  the  projection  slice  theorem’s  extension 
for  use  with  the  higher  dimensionality  of  the  light  field.  They  discuss  refocusing 
in  the  Fourier  domain  with  continuous  variables.  While  their  discussion  is  useful 
for  proving  the  validity  of  the  concept,  it  does  not  address  some  of  the  details  of  a 
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Figure  15.  The  Projection  Slice  Theorem.  The  projection  slice  theorem  states  that  the 
projection  operation  in  the  spatial  domain  has  a  Fourier  domain  equivalent  of  taking 
the  central  slice  perpendicular  to  the  direction  of  projection.  The  spectrum  in  the  lower 
right  (d)  can  be  obtained  either  by  taking  the  ID  Fourier  transform  of  the  projection 
(b),  or  by  extracting  the  slice  along  the  dotted  line  in  the  spectrum  (c). 


practical  implementation,  which  calls  for  the  use  of  summations  and  discrete  Fourier 
transforms.  Here,  we  walk  through  the  essentials  of  the  math  for  the  the  discrete 
case,  and  show  where  it  is  important  to  modify  the  results  provided  in  [11], 

In  order  to  proceed,  we  must  augment  the  list  of  operators  defined  in  the  previous 
section.  To  begin,  we  define  the  spatial  frequency  variables,  ks,  kt,  ku,  and  kv,  and 
their  normalized  equivalents,  ks,  kt)  ku,  and  kv,  where  k  =  kAk  =  k/(N  —  1).  Nyquist 
for  the  two  cases  is  defined  as  k^  =  ±1/2  and  kjy  =  ±(N  — 1)/2,  respectively.  We  also 
allow  for  vector  indexing  using  the  definitions  k  =  \ks,kt,  ku ,  kv]T  and  k<,  =  \ks,  kt]T ■ 
The  4D  Discrete  Fourier  Transform  is  defined  as 
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We  use  the  letter  G  to  refer  to  the  Fourier  transformed  light  held,  as  in 

G(k)  =  ,FT4[A'(X)](k).  (54) 

We  also  define  a  slice  operator  in  the  Fourier  domain  which  returns  the  snbspace 
obtained  by  setting  ku  =  0  and  kv  =  0,  as  in 

S[/(k)](ks)  =  f(ks,  kt,  0,  0).  (55) 

Appendix  A  shows  that  the  sequence  of  shearing,  projecting,  and  Fourier  trans¬ 
forming  is  equivalent  to  the  sequence  of  Fourier  transforming,  shearing,  and  slicing, 
i.e. 

(FT2 0  V  o  B»)[/(x)]  =  (5  o  B-F  o  JT*)[/(x)]  (56) 

where 

1  0  - rh(Nu/Ns )  0 

0  1  0  — m(Nv/Nt ) 

0  0  1  0 

0  0  0  1 

From  this,  it  follows  by  direct  evaluation  that 

JrT2[im5f(s)](A;s)  =  G  ^ ks ,  kt,  -m^ks,  -m^k^  .  (58) 

This  means  that,  in  the  frequency  domain,  a  refocused  image  is  formed  simply  by 
taking  a  2D  slice  from  the  transformed  light  held,  in  contrast  to  the  projection  op¬ 
eration  required  in  the  spatial  domain.  The  ideal  image  formation  criterion  derived 
in  the  previous  section  can  be  easily  shown  within  the  frequency  domain.  Outside  a 
range  of  alpha  values,  ks  is  cropped  leading  to  lost  high  spatial  frequency  information. 
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Figure  16.  Fourier  Slice  Imaging.  In  the  Fourier  domain,  an  image  is  contained  within 
a  central  slice  of  the  light  field.  The  figure  illustrates  the  steepest  slope  at  which  slicing 
can  occur  before  cropping  of  high  spatial  frequencies  takes  place. 


In  order  to  avoid  cropping,  ku  must  remain  less  than  the  Nyquist  limit  of  Nu/2  when 
ks  =  Ns/2,  as  indicated  in  Fig.  16.  By  Eq.  58,  this  leads  directly  to  the  requirement 
that  m  <  1,  as  discussed  previously.  The  same  result  is  derived  with  reference  to  the 
continuous  light  held  in  [11]  under  the  assumption  of  band  limited  performance. 

Interpolation  is  required  in  order  to  extract  an  arbitrary  angled  slice  from  the 
evenly  sampled  4D  light  held  space.  This  interpolation  is  best  thought  of  as  a  recon¬ 
struction  of  the  original  continuous  light  held  function  from  the  sampled  points,  which 
can  be  represented  as  weighted  delta  functions  within  the  continuous  space.  Recon¬ 
struction  is  achieved  by  convolving  the  gridded  delta  functions  with  some  manner  of 
interpolation  hlter. 

Any  finite  impulse  response  (FIR)  hlter  will  have  a  Fourier  domain  transfer  func¬ 
tion  of  infinite  extent.  Fig.  17  shows  the  Fourier  transform  of  some  common  inter¬ 
polation  kernels.  When  these  kernels  are  used  as  interpolation  hlters  in  the  Fourier 
domain,  the  tiled  spatial  image  is  multiplied  by  this  Fourier  transform.  Regions  where 
the  transfer  function  is  non-zero  outside  of  the  central  tile  of  the  spatial  domain  tend 
to  show  up  as  a  faint  shadowing  or  aliasing  effect  in  the  hnal  image  [22] . 

In  principle,  this  problem  is  resolved  by  using  the  ideal  SINC  interpolation  hlter 
whose  Fourier  transform  is  a  RECT  function.  The  use  of  such  a  hlter  would  eliminate 
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(a)  Linear  Interpolation  Kernel 


(e)  Kaiser-Bessel  Reconstruction 


(d)  Fourier  Transform 


Figure  17.  Interpolation  Filter  Performance.  Interpolation  can  be  envisioned  as  using 
a  reconstruction  filter  to  recreate  the  original  continuous  function  from  the  sampled 
function,  and  then  resampling  at  the  new  rate.  Convolution  in  one  domain  is  equiva¬ 
lent  to  multiplying  by  the  Fourier  transform  of  the  convolution  filter  in  the  alternate 
domain.  When  the  Fourier  transform  is  nonzero  outside  of  the  central  tile  (indicated 
by  the  leftmost  hashed  line  in  plots  b,  d,  and  f),  information  from  those  regions  appear 
as  ghosting  or  aliasing  within  the  final  alternate  domain  image.  Kaiser-Bessel  interpo¬ 
lation  is  good  for  reducing  aliasing  because  the  central  lobe  of  its  Fourier  transform 
can  be  adjusted  to  falloff  close  to  the  boundary  of  the  central  tile. 
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Figure  18.  Refocused  Image  Comparison.  The  image  formed  using  cubic  interpolation 
in  (a)  shows  noticeable  aliasing  near  the  dots  and  edges  of  the  die.  These  artifacts  are 
reduced  considerably  by  using  Kaiser-Bessel  interpolation  (b). 


aliasing  by  cropping  out  only  the  central  tile  of  the  spatial  domain.  To  imitate  SINC 
interpolation  with  a  FIR  filter,  [22]  describes  a  process  of  iteratively  truncating  within 
both  domains  until  the  resulting  filter  is  sufficiently  localized  in  each.  The  result  of 
this  process  is  known  as  the  prolate  spheroidal  wave  function  (PSWF),  and  can  be 
approximated  by  the  Kaiser-Bessel  function,  which  has  the  form 


h(x) 


WVl1  -  (2x/w)2)) 

wW ) 


(59) 


where  /?  is  an  attenuation  factor,  w  is  the  window  width,  and  Jo  is  the  modified  zero 
order  Bessel  function  of  the  first  kind  [22],  Trial  and  error  showed  that  a  window  of 
w  —  3  and  (3  =  5  gave  good  results  for  this  application. 

Fig.  18  compares  the  results  of  Fourier  domain  refocusing  using  Kaiser-Bessel 
reconstruction  compared  to  cubic  interpolation.  Since  the  Fourier  transform  of  the 
Kaiser-Bessel  function  drops  off  within  the  central  image  tile  of  the  spatial  domain, 
the  refocused  image  will  show  a  drop-off  in  intensity  away  from  the  center.  Though 
[21]  implies  that  this  must  be  corrected  by  premultiplication  of  the  Light  Field  by 
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the  inverse  of  the  filter  Fourier  transform,  it  was  found  that  multiplication  following 
image  formation  provided  equal  results  and  greater  flexibility. 

A  worthwhile  note  with  respect  to  implementation  is  that,  in  applications  where 
the  zero  element  of  the  image  is  located  at  the  upper-left  corner  rather  than  the 
image  center,  it  is  often  necessary  to  rearrange  quadrants  of  the  spatial  domain  prior 
to  Fourier  transforming  [21]. 

3.7  Focused  Plenoptic  Camera  Sampling 

The  previous  sections  deal  with  a  camera  in  which  the  detector  array  is  separated 
from  the  microlens  array  by  one  microlens  focal  length,  so  that  the  microlenses  are 
focused  at  infinity.  Since  the  main  lens /microlens  separation  is  large  compared  to 
the  scale  of  the  microlenses,  this  distance  approximates  optical  infinity,  and  the  mi¬ 
crolenses  can  be  thought  of  as  being  focused  on  the  main  lens.  Thus,  this  arrangement 
results  in  a  direct  mapping  between  position  on  the  detector  array  and  position  on 
the  main  lens  plane. 

If  the  detector  array  is  placed  at  some  distance  from  the  microlens  other  than  the 
microlens  focal  length,  then  the  microlenses  will  image  a  plane  other  than  that  of  the 
main  lens.  The  arrangement  has  been  referred  to  as  the  ‘focused’  plenoptic  camera 
configuration  [23]. 

Fig.  19  gives  a  diagram  of  a  focused  plenoptic  camera,  in  which  the  image  of  the 
object  (the  arrow)  exists  at  the  same  location  as  the  conjugate  plane  of  the  detector 
array.  In  this  case,  each  microlens  reimages  a  region  of  the  primary  image.  The 
spacing  of  the  pixels  beneath  the  microlens  will  determine  how  densely  the  primary 
image  is  spatially  sampled  within  each  microlens  image.  For  the  case  of  Fig.  19,  we 
can  imagine  that  the  pixels  are  spaced  so  as  to  sample  the  primary  image  at  the  two 
locations  shown. 
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Figure  19.  Focused  Plenoptic  Camera.  In  a  focused  plenoptic  camera,  the  microlenses 
do  not  focus  to  the  back  of  the  main  lens,  but  to  a  plane  within  the  camera.  If  the 
main  lens  produces  an  image  at  this  point,  it  will  reimage  to  the  detector  plane. 


Figure  20.  Conventional  Plenoptic  Camera  Comparison.  The  sampling  characteristics 
of  a  focused  plenoptic  camera  can  be  closely  mimicked  by  a  conventional  plenoptic 
camera  with  appropriately  sized  and  positioned  microlenses  and  detector  elements. 

Fig.  20  shows  the  setup  of  a  conventional  plenoptic  camera,  where  the  microlens 
plane  has  been  placed  at  the  plane  that  was  conjugate  to  the  detector  plane  of  the 
focused  plenoptic  camera.  Here,  the  size  of  the  microlenses  determines  spatial  sam¬ 
pling  of  the  image  and  the  subpixel  spacing  determines  angular  sampling.  By  choosing 
the  correct  microlens  and  pixel  sizes,  the  figure  suggests  that  a  conventional  plenop¬ 
tic  camera  can  achieve  the  sampling  characteristics  of  a  focused  plenoptic  camera, 
though  the  subsequence  image  formation  process  from  the  raw  sensor  data  will  be 
quite  different. 

In  order  to  formalize  this  suggestion  of  equivalence,  we  examine  the  sampling  pat¬ 
terns  for  the  two  types  of  cameras.  First,  we  need  to  determine  how  the  conventional 
plenoptic  camera  samples  the  light  field  at  the  main  lens  (u)  plane  and  microlens  (s) 
plane.  For  equivalence,  this  sampling  must  be  matched  (in  terms  of  sampling  density) 
by  the  sampling  of  the  light  field  by  the  focused  plenoptic  camera  at  the  main  lens 
(u)  plane  and  the  plane  conjugate  to  the  detector  plane  (the  s'  plane  in  Fig.  21). 
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Figure  21.  Focused  Plenoptic  Camera  Geometry.  Geometry  of  a  focused  plenoptic 
camera.  The  s'  plane  is  the  conjugate  plane  of  the  detector  array. 

Fig.  22  shows  the  sampling  pattern  for  a  conventional  plenoptic  camera.  The  light 
received  by  a  pixel  is  constrained  first  by  the  stopping  performed  by  the  microlens  at 
the  microlens  plane,  and  next  by  the  spatial  extent  of  the  detector  itself — or  equiva¬ 
lently,  by  the  projection  of  the  detector  at  the  main  lens  plane.  The  figure  illustrates 
how  sampling  is  performed  at  the  microlens  plane  and  at  the  main  lens  plane. 

Fig.  21  provides  the  geometry  necessary  for  determining  the  focused  plenoptic 
camera  sampling.  Here,  a  is  the  distance  from  the  microlens  array  to  the  detector 
array,  and  a  and  b  are  related  by  the  lens  equation.  The  conjugate  plane  of  the 
detector  plane  is  designated  the  s'  plane.  The  figure  illustrates  a  number  of  similar 
triangles  formed  by  a  ray  of  light  passing  through  the  camera.  The  dimensions  of  the 
triangles  are  related  by 


s  -  s'  s'  -u  q  s-u 

—r— =  7 — i  =  ~  =  — •  (60) 

0  Ljyi  b  CL  Ljji 

For  a  single  microlens  (s  fixed),  we  see  that  sampling  in  u  (angular  sampling)  is 
dependent  on  pixel  size,  i.e.  A us  =  lm/aAq ,  where  we  use  the  s  subscript  to  indicate 
that  the  s  (the  microlens  location)  does  not  change.  Likewise  for  sampling  in  the  s' 
plane  (spatial  sampling):  A s's  =  b/aAq.  To  see  how  s'  changes  as  we  translate  across 
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Figure  22.  Conventional  Plenoptic  Camera  Sampling.  In  a  conventional  plenoptic 
camera,  sampling  in  the  microlens  plane  is  determined  by  the  microlens  size,  As,  while 
sampling  in  the  main  lens  plane  is  determined  by  the  magnified  detector  size. 

microlenses,  we  solve  Eq.  60  for  s'  in  terms  of  s  and  u,  to  give 


(61) 


from  which  we  see  that  A s'u  =  As(l  —  a/lm ).  Fig.  23  shows  the  sampling  pattern  for 
the  focused  plenoptic  camera,  which  illustrates  these  relationships. 

To  mimic  the  performance  of  a  focused  plenoptic  camera  with  a  traditional  plenop¬ 
tic  camera  with  microlenses  placed  in  the  s'  plane,  it  is  simply  necessary  to  ensure 
that  its  sampling  density  is  the  same  as  that  of  the  focused  plenoptic  camera.  Table 
2  shows  the  sampling  rates  for  the  two  variants,  and  Table  3  shows  how  parameters 
must  be  set  within  a  conventional  plenoptic  camera  to  mimic  the  performance  of  a 
focused  camera  with  given  parameters. 

Fig.  24  shows  a  possible  matching  of  sampling  patterns  between  a  focused  plenop¬ 
tic  camera  and  a  conventional  plenoptic  camera.  Notice  how  each  conventional  camera 
sample  contains  exactly  one  focused  camera  sample. 
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Figure  23.  Focused  Plenoptic  Camera  Sampling.  If  the  light  field  is  parameterized 
in  terms  of  s'  and  u,  focused  plenoptic  camera  sampling  is  dependent  on  a  number  of 
parameters  which  can  be  adjusted  to  mimic  traditional  plenoptic  camera  sampling. 


Table  2.  Conventional  and  Focused  Plenoptic  Camera  Sampling  Densities 


Conventional  Plenoptic  Camera 

Focused  Plenoptic  Camera 

Spatial  Sampling 

Angular  Sampling 

1/As 

NJD 

1  _  b  1 

As's  a  A q 

Nu  As's  Nu  Aq  a  /  f  -i  a  \ 

D  A s'u  D  As  b'  \  Irn) 

Table  3.  Conventional  and  Focused  Plenoptic  Camera  Equivalents 

Conventional  Plenoptic  Camera 

Focused  Plenoptic  Camera 

Microlens  Size 

N timer  of  Subpixels 

Agf 

(i  - 1) 

As 

Nu 

Subpixel  Size 

As  ( -i  a  \ 

Nu  \L  ImJ 

A  q 
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Figure  24.  Matched  Sampling  Performance  for  Focused  and  Traditional  Plenoptic 
Cameras:  Traditional  plenoptic  camera  and  conventional  plenoptic  camera  sampling 
patterns,  matched  via  the  equivalences  in  Table  3. 


It  is  worth  noting  that  the  sampling  of  the  two  cameras  is  not  identical  in  that  the 
mapping  that  relates  a  sensor  array  location  to  a  parameterized  light  held  coordinate 
is  much  different  for  the  two  cameras.  Techniques  for  generating  refocused  images 
directly  from  focused  plenoptic  camera  data  are  discussed  in  [23]  and  [24],  Gener¬ 
ation  of  a  light  held  with  a  2-plane  parametrization  is  presented  in  [25].  Despite 
differences  in  data  processing,  the  ability  to  create  a  conventional  plenoptic  camera 
which  samples  the  light  held  with  the  same  angular  and  spatial  sampling  densities 
as  a  focused  plenoptic  camera  means  that  the  focused  plenoptic  camera  need  not  be 
treated  as  a  separate  case  in  the  analysis  presented  in  this  paper.  Rather,  the  ex¬ 
pected  performance  for  a  given  focused  plenoptic  camera  may  be  determined  by  using 
the  equations  given  here  to  determine  the  equivalent  conventional  camera,  which  can 
be  used  for  further  analysis. 
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IV.  Plenoptic  Ranging 


4.1  Introduction 


The  fundamental  result  of  the  previous  chapter  is  that  a  point  in  object  space  is 
represented  by  a  2D  plane  within  the  4D  light  field.  The  orientation  of  the  plane  is 
directly  related  to  the  object’s  distance  from  the  camera,  as  well  as  other  fixed  camera 
parameters.  For  the  plenoptic  camera,  range  finding  is  the  operation  of  identifying 
these  region  and  determining  its  orientation. 

Eq.  8  shows  that  for  an  image  located  Zj  =  alm  from  the  main  lens,  the  light  field 
will  have  a  slope  m  =  ds/du  given  by 

Za  inc\\ 

m  —  —  —  - =  1 - -  H -  (62) 

Zi  Zi  }  z0 

where  z0  is  the  object  distance,  which  is  related  to  z%  by  the  lens  equation  (Eq.  6).  We 
use  the  term  ‘sampled  light  field  slope’  to  refer  to  the  slope  in  terms  of  s  samples  per 
u  sample,  m  =  ds/du  =  my,  where  7  =  Au/ As.  By  these  relationships,  a  difference 
in  distance  Sz  is  related  to  a  difference  in  slope  5m  or  in  sampled  slope  by 

sz  =  fsm  =  (63) 


Uncertainty  is  related  in  the  same  manner, 


_  _ 

O  z  j  Om  7  (~'m ) 

I'm 


(64) 


where  am  and  %  are  the  uncertainties  associated  with  m  and  m,  respectively,  and 
crz  is  the  uncertainty  associated  with  object  distance. 
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In  dealing  with  experimental  results,  uncertainty  is  determined  by  finding  the 
mean  square  error  (MSE)  or  root  mean  square  error  (RMSE)  of  the  estimated  quantity 
across  the  sample  represented  by  a  particular  light  field.  Mean  Squared  Error  is  well 
understood  to  be  defined  as 

1  N 

MSE  (x)  =  -J2(xi~Xi)2  (65) 

V  i= 1 

where  Xi  is  the  estimated  value  and  is  the  true  value.  On  the  other  hand,  in 
the  context  of  uncertainty  modeling,  variance  is  used  to  quantify  uncertainty.  The 
variance  of  an  estimator  is  calculated  typically  in  terms  of  the  variance  of  some  random 
variable  incorporated  into  a  simple  light  field  model.  Variance  is  defined  as 

1  N 

var(x)  =  —  ^(xi  -  x)2  (66) 

i= 1 

where  x  is  the  mean  value  of  the  sample  set.  The  two  metrics  are  related  by  [26] 

MSE(x)  =  var(x)  +  (Bias(x,x))2  (67) 

indicating  that,  for  an  unbiased  estimator,  the  metrics  should  be  equivalent.  For  this 
reason,  the  symbol  a  is  used  in  each  context,  whether  to  refer  to  RMSE  or  standard 
deviation.  Throughout  the  chapter  we  will  have  occasion  to  employ  a  few  properties 
of  the  variance.  The  first  is  a  scaling  property, 

var  (ax)  =  a2vai(x),  (68) 
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which  follows  directly  from  the  definition  of  variance.  The  second  is  known  as  the 
Bienayme  formula  [27],  which  states  that 


var 


N 

=  ^  var(x,;) 

i= 1 


(69) 


when  the  values  of  Xi  are  uncorrelated.  We  combine  these  two  properties  to  say  that 
va r(ax+by+c)  =  a2vai(x)  +  b2vai(y)  where  x  and  y  are  uncorrelated  random  variables 
and  a,  b,  and  c  are  constants. 

The  goal  of  this  chapter  is  to  obtain  an  expression  for  the  uncertainty  in  the 
sampled  light  field  slope,  cr)7l,  in  terms  of  parameters  intrinsic  to  the  sampled  light 
field,  i.e.  the  number  of  angular  samples,  Nu,  or  the  gradient  of  the  sampled  light  held. 
An  analysis  of  cr,m  is  particularly  useful  because  the  quantity  should  be  independent 
of  camera  parameters  such  as  microlens  size,  main  lens  diameter,  etc.  Thus,  the 
analysis  can  be  performed  on  a  light  field  recorded  by  an  arbitrary  camera,  and  then 
extrapolated  to  other  constructions  via  Eq.  64.  In  this  chapter,  we  examine  light  fields 
from  cameras  having  a  variety  of  sampling  characteristics  to  test  whether  the  sampled 
slope  uncertainty  can  truly  be  decoupled  from  camera  parameters  not  intrinsic  to  the 
sampled  light  field. 

Synthetic  light  fields  are  utilized  extensively  within  this  chapter  due  to  the  ease 
of  obtaining  ground  truth  depth  and  disparity  information.  Synthetic  light  fields  are 
typically  generated  by  some  type  of  3D  rendering  software,  by  translating  a  camera 
through  a  grid  of  positions  to  obtain  the  plcnoptic  camera’s  ‘subaperture  images.’ 
One  disadvantage  of  this  method  of  simulation  is  that  it  does  not  naturally  account 
for  the  spreading  effects  discussed  in  section  3.4. 

The  synthetic  light  fields  utilized  in  this  paper  are  generated  using  the  3D  model¬ 
ing  software,  Blender,  and  made  available  by  the  Heidelberg  Collaboratory  for  Image 
Processing  (HCI)  [28].  A  sample  light  field  is  shown  in  Fig.  25.  Though  vary- 
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Figure  25.  An  HCI  Lightfield.  Depths  are  assigned  physical  units  corresponding  to  the 
camera  parameters  given  in  Table  4. 


ing  camera  sampling  characteristics  would  be  performed  most  naturally  and  without 
constraints  by  returning  to  the  original  Blender  scenes,  within  this  work  this  option 
was  forgone  for  the  simplicity  of  simulating  tradeoffs  by  performing  resampling  of  the 
full  light  fields.  This  method  does  not  allow  for  the  addition  of  information,  so  any 
tradespaces  explored  must  involve  courser  sampling  than  the  original  rendered  light 
held. 

The  HCI  light  fields  have  an  angular  resolution  of  9  x  9  and  spatial  resolution  of 
768  x  768.  Based  on  information  provided  about  the  setup  of  the  Blender  rendering 
environment,  the  light  held  sampling  can  be  related  to  that  of  a  plcnoptic  camera 
with  the  characteristics  given  in  Table  4. 


Table  4.  HCI  Light  Field  Camera  Parameters 


D  f 

A  q 

As  A  u  Nu 

Ns 

Ws 

0.56m  0.95m 

lm 

15.5/im 

138.9/im  6.25cm  9 

768 

10.7cm 

Figs.  26  and  27  illustrate  three  ways  in  which  the  light  held  can  be  resampled 
in  order  to  investigate  the  role  of  different  parameters.  1)  Simply  cropping  the  light 
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Figure  26.  Plenoptic  Camera  Tradeoffs.  The  HCI  lightfleld  can  be  resampled  to  in¬ 
vestigate  the  role  of  various  camera  parameters.  In  (a),  lens  diameter  and  detector 
size  are  varied  to  increase  Nu  while  keeping  Ait  and  microlens  size  constant.  In  (b) 
Nu  is  increased  by  changing  only  detector  size,  (c)  involves  the  tradeoff  in  spatial  and 
angular  resolution  achieved  by  varying  microlens  size. 


~Aq  ~D  #As  #Au  ~Aq  #D  #As  #Aq  #D  -As 


■* - ► 

s 

Figure  27.  Tradeoff  Sampling.  Resampling  required  for  the  simulated  cameras  in 
Fig.  26. 


57 


field  in  the  angular  dimensions  by  successively  decreasing  amounts  is  equivalent  to 
expanding  the  camera  diameter  while  keeping  microlens  size  constant  and  decreas¬ 
ing  detector  size.  2)  Holding  all  parameters  constant  while  changing  pixel  size  can 
be  simulated  by  resampling  within  only  the  angular  dimensions.  3)  Changing  the 
microlens  diameter  results  in  a  tradeoff  between  angular  and  spatial  sampling  rates. 
This  tradeoff  is  simulated  by  resampling  in  both  the  spatial  and  angular  dimensions. 
In  resampling,  it  is  crucial  that  proper  interpolation  be  employed  to  eliminate  alias¬ 
ing.  Here,  low  pass  Gaussian  filtering  was  employed  to  remove  all  frequencies  above 
Nyquist  prior  to  downsampling  via  nearest-neighbor  interpolation. 

Three  slope  estimation  frameworks  are  examined  in  the  chapter.  The  first  utilizes 
a  feature  matching  algorithm  to  determine  correspondences  between  images,  resulting 
in  a  sparse  3D  point  cloud.  The  second  approach  can  be  thought  of  as  an  extension  of 
traditional  image  correlation  techniques  to  the  expanded  light  held  space.  It  looks  for 
minima  in  the  variance  calculated  along  different  slopes  within  the  light  held.  Finally, 
a  Fourier  domain  ranging  technique  is  explored,  and  its  performance  is  evaluated. 

4.2  3D  Point  Clouds  using  Feature  Matching 

Image  registration  provides  one  avenue  of  approach  to  the  depth  estimation  prob¬ 
lem.  The  major  elements  of  an  image  registration  algorithm  are  a  feature  detector 
and  descriptor.  The  existence  of  a  feature  detection  step  sets  this  method  apart  from 
many  of  the  others  to  be  discussed.  Having  such  a  step  means  that  the  resulting 
depth  map  will  be  to  some  degree  sparse-i.e.,  a  depth  estimate  will  not  be  generated 
for  every  pixel  in  a  rendered  image  of  the  scene.  The  advantage  closely  related  to  this 
is  that,  once  a  feature  has  been  detected,  it  is  typically  a  small  matter  to  estimate 
its  location  with  a  sub-pixel  level  of  accuracy.  For  example,  given  a  line  of  pixels 
identified  by  an  edge  detector  to  constitute  an  edge,  the  location  of  the  edge  in  sub- 
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pixel  space  can  be  estimated  with  a  simple  linear  fit.  An  approach  similar  to  this  is 
described  below  for  the  feature  detection  employed  in  this  section. 

Once  a  collection  of  features  has  been  identified,  a  descriptor  vector  is  generated 
for  each  feature  using  the  region  surrounding  the  feature  within  the  image.  The 
descriptor  must  capture  the  salient  attributes  of  the  surrounding  to  give  a  distinctive 
description,  capable  of  distinguishing  the  feature  from  all  others  detected  within  the 
scene.  These  descriptor  vectors  are  then  matched  with  other  descriptor  vectors  using 
Euclidean  distance,  spectral  angle,  or  some  other  classifier,  to  establish  a  mapping 
between  the  two  images.  Image  registration  algorithms  are  often  designed  to  be 
robust  to  translations,  rotations,  and  scalings  of  an  original  image.  Thus,  it  is  very 
important  that  the  detector  be  able  to  identify  the  same  features  within  an  image 
under  these  effects. 

This  section  provides  a  theoretical  framework  for  predicting  the  expected  un¬ 
certainty  within  the  context  of  feature  matching.  Features  within  separate  images 
are  assumed  to  be  correctly  detected  and  matched,  such  that  any  error  results  from 
feature  localization  error.  Feature  localization  is  treated  alternately  with  the  assump¬ 
tion  of  pixel  level  accuracy  and  the  assumption  of  error  with  a  normally  distributed 
probability  density  function. 

Quantization  Error. 

The  case  of  feature  localization  to  within  pixel  accuracy  can  be  treated  by  the 
model, 

Si  =  mui  +  e,  (70) 

where  e  is  uniformly  distributed  over  (—As/2,  As/2). 
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Figure  28.  Quantization  Error  Visualization.  The  figure  shows  the  range  of  possible 
slopes  when  a  line  is  known  with  pixel-level  precision. 


For  such  a  case,  rh  is  a  maximum  likelihood  estimator  of  m  if  and  only  if  < 
rh  <  rh. where 


rh+  =  min 


Si  +  Au/2 


Ui 


(71) 


and 


m_  =  max 


(72) 


give  the  extrema  of  slopes  falling  within  the  bounds  of  the  error  distributions,  as 
shown  in  Fig.  28.  The  estimator  rh'  =  (m++m_)/2  has  certain  optimality  properties, 
discussed  in  [29].  Its  uncertainty  is  given  by  Am  =  rh+  —  m_. 


Normally  Distributed  Error. 

The  assumption  of  normally  distributed  error  is  useful  because  it  leads  to  a  clean 
analytic  result.  The  model  underlying  this  section  is  given  by 


Si  =  mui  +  n,  (73) 

where  n  is  a  zero- mean  normally  distributed  random  variable  with  variance  cr^,  and 
Ui  =  iAu,  where  i  G  {1,1VU}.  Given  this  model,  simple  linear  regression  provides 
the  optimal  estimator  for  m,  with  rh  =  cov(s,  u)/vax(u),  where  the  variance  and 
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covariance  are  well  understood  to  be  defined  as 


cov(>,  u)  =  —  £>-  <  Ui  > ) (Sj—  <  Si  >) 


var(w)  =  —  >)2  (75) 

u  i= 1 

where  the  angular  brackets  are  used  to  denote  the  mean  of  the  enclosed  variable.  The 
covariance  can  be  rewritten  by  substituting  in  Eq.  73,  as  in 


cov(s,  u)  =  —  <  ui  >)(mui  +  rii  —  m  <  u.i  >). 


Upon  factoring,  this  gives 


cov(s,  u)  =  —  m£(ut-  <  Ui  >)2  +  Yfa-  <  Ui  >)rii  . 


Substituting  these  expressions  for  covariance  and  variance  into  the  equation  for  m, 
we  obtain  an  updated  expression  for  the  slope  estimator: 


m  =  cov(s,  u)/var(u)  =  m  + 


^(■Ui-  <  Ui  >)rii 
2=1 

y ^(uj’~  <  ui  >)2 


Employing  the  fact  that  var (ax  +  by  +  c)  =  a2a2xJrb~(j2y  if  a,  b,  and  c  are  constants  and 
x  and  y  are  random  variables,  as  discussed  in  the  chapter  introduction,  the  variance 
of  the  estimator  directly  reduces  to 


var  m  = 


Nvvar(u) 
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Assuming  that  Nu  is  odd,  the  denominator  can  be  written  as 


Nu  {Nu- 1)/2 

Nuvax(u)  =  ^  (tq2—  <Ui  >)“  =  2A  u2  ^  i2  (80) 

2—1  2—  1 

where  we  have  used  the  definition  of  tq  and  a  reindexing  to  provide  an  equivalent 
expression.  The  series  i 2  is  a  square  pyramidal  number  having  a  known  analytic 
sum  of  n(n+  l)(2n  + 1)/6  given  by  Faulhaber’s  formula  [30].  This  substitution  allows 
for  more  convenient  expression  as 

Nuvar(u)  =  A u2Nu(Nu  -  1  )(NU  +  1)/12  «  D2NU/ 12.  (81) 


This  definition  can  be  substituted  back  into  Eq.  79  to  obtain  a  final  expression  for 
the  uncertainty  in  m: 


(82) 


The  result  in  terms  of  the  sampled  light  field  error  is  obtained  via  <jm 
amAu/As. 


_  an/As  [~12  _  oy  [ 12 
Nu  \Inu~Nu\Inu 


'~Y&m 


(83) 


where  an  becomes  the  registration  error  as  a  fraction  of  pixel  size. 

The  preceding  calculations  apply  to  the  case  where  the  slope  is  estimated  from 
a  single  2D  slice  of  the  4D  light  field  where  t  and  v  are  fixed.  In  evaluating  the 
fundamental  performance  limitation  for  estimating  from  the  full  4D  space,  we  assume 
that  Nu  samples  in  u  are  available  for  each  of  Nv  =  Nu  values  of  v.  Working  through 
the  same  process  for  Ui  =  Aw{l,  1...,  2,  2...,  Nu,  Nu...}  for  a  total  of  N2  samples  gives 
an  improved  uncertainty, 


gn  ~  crn  \/l2 

D  ^/N2-  2  ~  D  Nu' 


(84) 
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Once  again,  the  error  in  terms  of  the  sampled  light  held  is  given  by 

12 

&rh  & n  '  (85) 

J  *it 

Stereo  Ranging  with  Normally  Distributed  Error. 

This  section  provides  a  simple  framework  for  comparing  predicted  performance 
between  a  plenoptic  and  stereoscopic  system.  Fig.  29  shows  the  geometry  of  the 
system  to  be  considered.  Within  such  a  stereo  vision  system,  the  location  of  an 
object  within  each  camera’s  image  specifies  a  line  traveling  out  from  the  camera  into 
object  space.  These  two  lines  form  a  simple  linear  system  which  can  be  solved  to  give 
a  depth  estimate  in  terms  of  the  disparity,  d,  in  the  image  location  between  the  two 
cameras, 

zest  —  Bf  /d  —  f  /m  (86) 

where  B  is  the  baseline  separating  the  two  camera  axes  and  /  is  the  focal  length  of 
the  pinhole  cameras  [1].  For  simpler  comparison  with  the  plenoptic  camera,  we  have 
reformulated  the  result  in  terms  of  a  slope,  m  =  d/B. 


B 

Figure  29.  A  Simple  Stereo  Ranging  Setup.  The  diagram  represents  two  pinhole 
cameras  with  parallel  optical  axes  separated  by  distance  B. 
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Figure  30.  Equal  Baseline  for  Stereoscopic  and  Plenoptic  Systems. 


We  assume  that  feature  registration  between  the  two  images  is  performed  with 
some  normally  distributed  error.  That  is,  d  =  qi  —  q2,  where  qi  and  g2  ate  ran¬ 
dom  variables  with  normally  distributed  probability  density  functions  (PDF)  having 
standard  deviation  a.  The  PDF  of  the  sum  of  two  random  variables  is  equal  to  the 
convolution  of  the  original  PDFs.  Since  the  convolution  of  two  Gaussian  distributions 
is  a  third  Gaussian  distribution,  having  o\  =  a\  +  erf,  it  follows  that  the  PDF  of  d  is 
normally  distributed  with  a  variance  of  erf  =  2er2.  The  uncertainty  in  the  slope,  m, 
expressed  as  a  standard  deviation,  is  then  given  by 


°A 

B 


(87) 


This  equation  will  be  useful  in  evaluating  results  within  the  next  section. 

An  interesting  comparison  involves  the  case  where  the  stereo  baseline,  B ,  is  equal 
to  the  plenoptic  lens  diameter  D  (See  Fig.  30),  and  both  systems  have  the  same 
pixel  size,  A q.  We  assume  that  the  feature  localization  error,  cr  is  proportional  to  A q 
for  the  stereo  system  and  As  for  the  plenoptic  camera.  Using  Eqs.  82  and  87,  the 
uncertainties  are  related  by 

KOpZen  =  ^  (g8) 

m)  ster 
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This  comparison  gives  some  sense  of  the  advantages  and  disadvantages  of  each  system. 
In  general,  both  types  of  systems  appear  to  operate  on  the  same  general  playing  field. 
The  plcnoptic  camera  has  the  advantage  of  being  a  monocular  system  with  a  minimal 
hardware  requirement.  Its  primary  disadvantage  is  that  its  performance  is  coupled  to 
lens  size,  which  is  more  limited  than  the  camera  baseline  of  the  stereo  system.  Since 
this  equation  is  derived  within  the  context  of  feature  matching,  it  also  does  not  take 
into  account  the  benefits  of  other  approaches  to  light  field  ranging  to  be  discussed  in 
later  sections,  which  improve  upon  feature  matching  performance. 

Estimator  Comparison. 

Fig.  31  compares  the  estimators  described  in  this  section  for  the  cases  of  quanti¬ 
zation  error  and  normally  distributed  error.  The  figure  illustrates  that,  for  the  case 
when  feature  location  is  known  within  pixel  accuracy,  the  maximum  likelihood  esti¬ 
mator  is  superior  until  there  are  more  than  about  20  angular  samples.  In  general,  the 
stereo  estimator  is  much  worse  for  this  case. 

When  the  feature  location  estimate  is  subject  to  a  normally  distributed  error 
term,  there  is  very  good  agreement  between  the  error  obtained  via  simulation  and 
the  expected  error  derived  previously.  The  error  resulting  from  SLR  estimation  falls 
off  noticeably  faster  with  the  number  of  angular  samples  than  the  stereo  estimation. 

An  unexpected  result  is  that,  in  the  case  of  normally  distributed  error,  it  is  possible 
to  achieve  better  performance  than  obtained  using  simple  linear  regression  by  using 
a  modification  of  the  maximum  likelihood  estimator  for  the  case  of  uniform  error. 
The  limits  in  Eqs.  71  and  72  represent  the  constraints  on  possible  slopes  imposed 
by  the  collective  uncertainty  limits  of  the  data  points.  In  the  context  of  normally 
distributed  error,  this  is  not  a  meaningful  concept,  as  no  slope  is  impossible,  however 
improbable.  By  scaling  the  A u  term  in  each  of  the  equations,  we  can  select  by  trial 
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Figure  31.  Slope  Estimator  Performance.  Plot  (a)  applies  the  Maximum  Likelihood 
Estimator  (MLE),  Simple  Linear  Regression  (SLR),  and  Stereo  estimator  to  the  case  of 
a  rasterized  line  s  =  round(mu  +  6),  where  m  was  sampled  at  1000  evenly  spaced  points 
between  0  and  100,  and  b  was  sampled  at  1000  points  between  -1  and  1.  The  MLE  is 
superior  at  small  angular  sample  numbers,  but  increasingly  gives  out  to  the  SLR  at 
higher  numbers.  Plot  (b)  gives  the  theoretical  and  simulated  Root  Mean  Square  Error 
for  the  case  of  a  line  with  normally  distributed  error,  s  =  mu+n ,  where  an  =  0.28  in  order 
to  match  the  standard  deviation  of  a  uniform  distribution  of  unit  width.  Since  changing 
m  was  not  found  to  affect  results,  rn  was  fixed  at  0.1.  The  RMSE  was  calculated  over 
10000  samples  for  each  point.  The  agreement  between  the  simulation  and  theory  is 
strong  for  both  the  SLR  and  stereo  case,  indicating  that  the  formulas  derived  in  the 
previous  sections  are  valid.  An  unexpected  result  is  that  a  modification  of  the  MLE 
for  uniform  error  gives  improved  performance  over  the  Simple  Linear  Regression. 
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and  error  the  range  which  gives  the  best  slope  estimation  performance.  The  improved 
performance  in  Fig.  31  (b)  was  obtained  by  using  a  multiplicative  factor  of  2.  The 
improvement  in  performance  over  the  SLR  estimator  is  likely  due  to  the  fact  that  the 
MLE  estimator  as  we  have  presented  it  uses  knowledge  about  the  zero  u-intercept  of 
both  of  the  models  discussed  in  this  section,  whereas  the  SLR  method  assumes  that 
the  intercept  is  unknown.  In  reality,  the  intercept  will  be  some  unknown,  non-zero 
value.  We  expect  that,  if  an  estimator  for  the  case  of  unknown  intercept,  as  described 
in  [29],  were  to  be  employed,  its  performance  would  not  exceed  that  of  the  simple 
linear  regression.  For  this  reason,  the  simple  linear  regression  estimator  is  employed 
within  the  next  section. 

4.3  The  Scale  Invariant  Feature  Transform 

The  Scale  Invariant  Feature  Transform  (SIFT),  as  described  by  David  Lowe,  is  one 
common  algorithm  for  matching  features  between  images  taken  of  the  same  subject 
[31],  and  has  become  the  present  ‘gold  bar’  standard  for  image  registration  within  the 
computer  vision  community.  This  section  describes  the  SIFT  algorithm. 

SIFT  Feature  Detection. 

In  order  to  achieve  scale  invariant  feature  detection,  SIFT  first  generates  a  scale 
space  representation  of  an  image.  Scale  space  adds  an  additional  parameter,  a,  to 
an  image,  im(x,  y ),  such  that  im(x,  y,  a)  gives  a  blurring  of  the  original  image  to  the 
point  where  only  features  on  the  scale  of  cr  can  be  discerned.  Blurring  is  performed 
using  a  Gaussian  filter,  which  allows  repeated  convolutions  to  be  efficiently  utilized 
to  keep  kernel  size  small  even  as  a  grows  large.  Downsampling  at  each  doubling  of  a 
is  also  used  for  the  same  purpose. 
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Feature  detection  within  scale  space  is  performed  using  a  edge  detector  known  as 
the  Difference  of  Gaussians  (DoG).  The  DoG  representation  of  an  image  is  obtained 
by  subtracting  the  image  at  one  scale,  im(x,y,a),  from  an  image  at  a  larger  scale, 
im(x,  I/,  kcr). 

The  Gaussian  convolution  involved  in  the  generation  of  scale  space  is  equivalent  to 
low  pass  filtering  of  the  image.  When  one  low  pass  filter  is  subtracted  from  another, 
wider  low  pass  filter,  the  result  is  a  band  pass  filter.  Thus,  the  response  of  the  DoG 
filter  will  be  highest  to  those  features  whose  scale  puts  them  within  the  passband  of 
this  filter.  A  point  is  considered  a  local  extreme  when  it  is  uniformly  larger  or  smaller 
than  all  26  of  its  nearest  neighbors  within  scale  space. 

For  a  simple  analytic  example,  we  can  consider  a  feature  consisting  of  a  Gaussian 
blob  with  a  standard  deviation  er.  The  Gaussian  filter  used  in  scale  space  generation 
starts  at  cro  and  grows  by  a  factor  of  k  —  21/5,  where  S  is  the  number  of  steps  per 
octave.  The  scale  space  representation  is  then  given  by  a  Gaussian  with  variance 
of  =  er2  +  offc2",  which  at  its  peak  has  a  value  of 


Sn 


1 

a/27t(o‘2  +  ofA;2™) 


(89) 


Setting  the  second  derivative  of  this  expression  equal  to  zero  gives  the  location  where 
difference  Sn  —  Sn+ 1  reaches  an  extrema.  This  is  easily  shown  to  result  in 


Oo  kn  =  VZcr, 


(90) 


indicating  that  the  DoG  will  reach  an  extrema  where  the  scale  space  scale  is  on  the 
order  of  a  feature  scale.  This  is  illustrated  graphically  in  Fig.  32.  The  first  column 
shows  different  layers  of  the  scale  space  representation  of  a  Gaussian  blob  with  a 
standard  deviation  of  10.  The  second  column  shows  the  difference  taken  between 
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Figure  32.  Difference  of  Gaussians  Feature  Detector.  The  first  column  shows  the  scale 
space  representation  of  a  feature  consisting  of  a  Gaussian  ’blob’  with  standard  deviation 
10.  The  DoG  spatial  peak  reaches  a  maximum  when  scale  spaces  used  to  form  the  DoG 
are  near  the  scale  of  the  original  feature. 


successive  layers  of  scale  space.  The  peak  of  the  difference  is  plotted  in  the  final 
column,  and  is  seen  to  reach  a  maxima  where  the  scale  space  a  is  near  to  that  of  the 
original  feature. 


SIFT  Descriptor. 

When  features  are  detected  at  a  particular  scale,  a  feature  descriptor  is  compiled 
based  on  the  feature’s  surroundings  at  that  scale.  This  ensures  that  those  same 
surroundings  will  be  nsed  to  build  a  descriptor  in  any  other  image  at  an  arbitrary 
scale  where  the  same  feature  is  detected.  Since  SIFT  is  meant  to  accommodate  the 
possibility  of  such  changes  in  scale  between  images,  all  features  detected  in  one  image, 
regardless  of  scale,  will  traditionally  be  compared  with  all  features  of  another  image 
during  the  matching  stage. 

The  SIFT  descriptor  uses  image  gradient  information  to  characterize  the  local 
neighborhood  of  a  feature.  The  gradients  calculated  within  an  Nx  by  Ny  region  are 
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first  multiplied  by  a  Gaussian  weighting  function,  and  then  sliced  into  any  number 
of  smaller  subregions  (The  original  SIFT  implementation  described  by  Lowe  used 
a  4  by  4  grid  of  subregions).  Histograms  of  the  image  gradient  in  each  region  are 
then  concatenated  to  form  a  descriptor  vector  for  the  feature.  To  achieve  rotation 
invariance,  the  direction  of  the  maximum  gradient  is  first  subtracted  from  all  gradient 
orientations  prior  to  histogram  formation. 

SIFT  Implementation. 

Though  SIFT  is  designed  to  perform  image  registration  in  the  presence  of  im¬ 
age  scaling,  rotation,  and  translation,  not  all  of  these  factors  are  present  within  the 
plenoptic  ranging  problem.  Eliminating  these  extra  degrees  of  flexibility  allows  for 
the  development  of  a  modified  feature  matching  algorithm,  which  should  outperform 
a  full  application  of  SIFT.  A  few  of  these  changes  are  listed  here. 

1)  Since  neighboring  subaperture  images  are  not  rotated  from  each  other,  feature 
vectors  do  not  need  to  use  orientations  relative  to  the  gradient  of  greatest  magnitude. 
This  stands  to  eliminate  errors  resulting  when  two  gradients  are  of  nearly  the  same 
magnitude. 

2)  To  achieve  scale  invariance,  SIFT  builds  feature  vectors  from  the  scale  space 
layer  at  which  the  feature  was  detected.  Descriptor  vectors  are  thus  ‘scale  normalized,’ 
and  can  be  compared  to  features  detected  at  any  scale  within  a  separate  image.  Since 
no  rescaling  occurs  between  subaperture  images,  in  principle  it  should  be  possible  to 
only  compare  a  feature  in  one  image  with  features  detected  at  the  same  scale  within 
a  neighboring  image.  In  practice,  we  allow  all  features  within  the  same  octave  to  be 
compared,  as  the  DoG  detector  will  not  with  perfect  consistency  detect  a  feature  at 
the  same  scale  under  image  translation. 
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3)  To  accommodate  arbitrary  translation,  rotation,  and  scaling,  SIFT  compares 
all  features  detected  within  one  image  to  all  features  within  a  second  image.  This 
means  that  the  location  of  a  feature  must  be  definite  in  all  dimensions.  For  example, 
edges  whose  location  along  the  edge  is  difficult  to  define,  must  be  culled  by  the 
SIFT  algorithm.  In  plenoptic  range  finding,  the  transformation  between  neighboring 
subapture  images  is  constrained  to  a  translation  along  a  known  direction.  This  has 
two  consequences.  Firstly,  features  need  only  to  be  matched  with  features  in  the 
second  image  along  the  known  line  of  translation.  Secondly,  detection  of  edges  can  be 
allowed  and  encouraged  by  removing  the  requirement  for  the  DoG  to  be  a  maximum 
in  the  direction  transverse  to  translation. 

Feature  Matching. 

Feature  matching  is  performed  by  searching  for  the  least  Euclidean  distance  be¬ 
tween  feature  vectors.  In  cases  where  a  feature  in  one  image  does  not  have  an  equiv¬ 
alent  within  the  second  image,  due  to  either  detection  failure  or  false  detection,  it  is 
expected  that  the  minimum  Euclidean  distance  will  not  deviate  far  from  the  next- 
smallest  distance.  In  order  to  remove  such  cases,  only  matches  are  retained  where 
the  minimum  distance  is  smaller  than  all  other  distances  by  at  least  some  specified 
factor  which  we  will  refer  to  as  a  matching  threshold. 

Stereo  Ranging  using  SIFT. 

In  this  section  we  examine  the  results  of  stereo  matching  using  SIFT.  Stereo 
matching  is  performed  using  the  two  extreme  subaperture  images  of  a  synthetic  light 
field.  Since  ground  truth  for  depths  within  the  synthetic  light  field  is  known,  this  can 
be  used  to  calculate  the  actual  disparity  for  each  image  point.  The  two  performance 
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metrics  examined  in  this  section  are  the  variance  of  the  estimated  disparity  from  the 
actual  disparity  across  the  entire  imag,  and  the  number  of  matches. 

SIFT  performance  is  based  on  a  number  of  different  parameters  alluded  to  in 
the  previous  sections.  The  starting  scale  of  the  first  octave  of  scale  space,  which 
gives  the  amount  of  initial  blurring  of  the  image,  determines  the  scale  of  the  smallest 
features  that  will  be  detected.  A  detection  threshold  determines  how  large  a  local 
DoG  extrema  must  be  in  absolute  value  in  order  to  be  classified  as  a  feature.  Finally, 
the  matching  threshold  determines  how  close  a  match  must  be,  compared  to  other 
close  matches,  in  order  to  be  retained  as  a  true  match.  Each  of  these  parameters 
was  varied  across  a  range  of  values,  using  both  the  author’s  modified  implementation 
of  SIFT  and  a  standard  implementation  called  VLFeat,  meant  to  closely  mimic  the 
specifications  of  Lowe’s  paper  [32], 

Fig.  33  shows  the  results  for  the  authors  SIFT  implementation,  while  Fig.  34 
shows  the  results  for  VLFeat.  As  expected,  the  author’s  implementation  does  provide 
a  much  greater  volume  of  matches,  with  all  parameters  being  equal.  However,  the 
variance  achieved  with  VLFeat  is  also  considerably  lower  than  that  of  modified  SIFT. 

Across  the  board,  increasing  the  matching  threshold  leads  to  better  performance 
at  the  cost  of  match  count.  Both  of  these  effects  are  expected.  Though  the  number 
of  matches  continues  to  drop  as  the  matching  threshold  is  increased,  the  falloff  in 
variance  diminishes  quickly  after  a  value  of  about  2,  making  this  an  optimal  choice. 
Increasing  the  detection  threshold,  while  reducing  the  number  of  matches,  does  not 
seem  to  result  in  better  accuracy.  This  may  indicate  that  a  non-zero  detection  thresh¬ 
old  leads  to  detection  failure  (a  feature  detected  in  one  image,  but  not  in  a  second 
image).  The  trend  with  respect  to  initial  blurring  scales  is  slightly  more  difficult  to 
interpret.  Nonetheless,  in  both  cases,  an  initial  blurring  scale  of  a  =  0.8  provides  the 
best  results. 
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(a) 


(b) 


(d) 


Figure  33.  Stereo  Matching  Performance  using  Modified  SIFT  (Author’s  Implemen¬ 
tation).  (a)  and  (b)  show  the  variance  from  ground  truth  and  number  of  matches, 
respectively,  in  terms  of  matching  threshold  and  level  of  initial  blurring.  The  multiple 
lines  at  each  blurring  level  correspond  to  different  detection  thresholds,  (c)  and  (d) 
explicitly  show  the  dependence  on  detection  threshold,  at  a  matching  threshold  of  two. 
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Figure  34.  Stereo  Matching  Performance  using  SIFT  (VLFeat).  (a)  and  (b)  show  the 
variance  from  ground  truth  and  number  of  matches,  respectively,  in  terms  of  matching 
threshold  and  level  of  initial  blurring.  The  multiple  lines  at  each  blurring  level  corre¬ 
spond  to  different  detection  thresholds,  (c)  and  (d)  explicitly  show  the  dependence  on 
detection  threshold,  at  a  matching  threshold  of  two. 
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(a)  (b) 


Figure  35.  Maps  from  Stereo  Matching  using  SIFT,  (a)  and  (b)  show  the  maps  yielded 
by  the  author’s  SIFT  implementation  and  by  VLFeat,  respectively,  using  the  optimal 
conditions  determined  from  Figs.  33  and  34,  namely,  with  crm,n  =  0.8,  Matching  Thresh¬ 
old  =  2,  and  Detection  Threshold  =  0.  VLFeat  yields  better  accuracy  at  the  cost  of 
match  count. 


Fig.  35  shows  the  depth  maps  generated  by  each  method  using  optimum  param¬ 
eters.  In  each  case,  the  minimum  blurring  was  chosen  as  =  0.8,  the  detection 
threshold  was  set  to  zero,  and  the  matching  threshold  was  set  to  2.  Though  the  au¬ 
thor’s  implementation  provides  a  much  greater  number  of  matches,  the  variance  from 
the  true  disparity  is  large  compared  to  that  achieved  using  VLFeat.  Since  the  intent 
of  this  research  is  to  assess  the  performance  limits  of  the  plcnoptic  camera,  VLFeat 
is  used  in  further  analysis.  Future  work  might  further  explore  the  tradeoffs  existing 
between  the  two  implementations,  and  how  overall  performance  could  be  optimized. 


Light  Field  Ranging  using  SIFT. 

Improved  performance  over  the  stereo-matching  results  presented  in  the  previous 
section  should  be  possible  by  using  the  intermediate  subaperture  images,  in  addition 
to  those  located  at  the  two  extremes,  to  estimate  disparity.  In  its  fullest  application, 
this  would  entail  feature  matching  between  each  subaperture  image  and  its  4  near¬ 
est  neighbors.  In  this  section,  we  deal  with  the  simplified  case  where  one  angular 
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Figure  36.  Feature  Matching  Framework.  In  stereo  ranging,  slope  estimates  are  formed 
using  only  two  images  from  the  stack.  When  the  entire  light  field  is  available,  interme¬ 
diate  images  can  be  used  to  achieve  better  estimate  via  simple  linear  regression. 

coordinate,  u,  is  varied  while  the  other,  v,  remains  fixed,  thus  giving  Nu  different 
subaperture  images. 

The  approach  employed  is  to  establish  feature  matches  between  each  subaperture 
image  and  its  two  neighbors  at  u  +  1  and  u  —  1.  These  matches  are  sorted  in  order 
to  produce  a  matrix  in  which  each  column  corresponds  to  a  feature  and  each  row 
corresponds  to  a  subaperture  image,  cell  values  giving  the  location  of  the  feature 
within  the  image. 

If  the  same  feature  is  being  accurately  detected  and  matched  within  each  image,  it 
should  follow  that  the  disparity  between  successive  images  will  be  nearly  the  same  size. 
To  remove  cases  where  features  are  improperly  matched,  we  calculate  the  variance  of 
the  disparity  in  each  column,  and  throw  out  columns  in  which  the  variance  exceeds 
a  specified  threshold. 

Fig.  36  contrasts  the  approach  taken  here  with  the  stereo  based  approach  of  the 
previous  section.  Given  increased  number  of  sample  points,  a  simple  linear  regression 
becomes  an  appropriate  approach  to  determining  the  light  field  slope.  Fig.  37a  shows, 
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(a)  (b) 

Figure  37.  Simulated  Camera  Performance  with  SIFT.  In  (a),  Nu  is  increased  according 
to  the  first  scheme  in  Fig.  27,  such  that  7  stays  constant.  The  improvement  in  accuracy 
with  Nu  falls  short  of  that  predicted  in  Eq.  85.  Also,  the  results  obtained  using  linear 
regression  do  not  significantly  improve  upon  results  obtained  via  stereo,  (b)  shows  the 
uncertainty  for  the  other  two  schemes  in  Fig.  27.  Increasing  Nu  in  either  of  these 
manners  does  not  lead  to  an  overall  improvement  in  uncertainty,  represented  by  the 
continuous  light  field  slope  error. 


for  both  the  case  of  slope  estimated  using  the  two  extreme  images  (stereo)  and  the 
case  of  slope  estimated  from  the  entire  range  of  images  (linear  regression),  how  slope 
uncertainty  diminishes  as  angular  samples  are  added  in  a  manner  corresponding  to 
the  first  scheme  in  Fig.  27. 

Interestingly,  the  performance  of  the  stereo  estimator  is  remarkably  close  to  that 
of  the  linear  regression  estimator.  A  comparison  of  Eqs.  82  and  87  indicates  that  the 
error  when  slope  is  calculated  using  the  linear  regression  should  drop  off  faster  than 
the  stereo  case  by  an  additional  factor  of  1  /  y/Nu- 

A  plausible  explanation  for  this  discrepancy  might  be  that  the  feature  localization 
error  does  not  match  the  assumption  of  a  normally  distributed  probability  density 
function.  However,  Fig.  38  illustrates  that  the  localization  error  distribution  is  fairly 
well  approximated  by  a  normal  distribution.  The  localization  error  in  the  figure  was 
calculated  by  using  the  average  of  the  feature  locations  across  all  subaperturc  images 
as  a  true  location  for  the  central  subaperture  image.  True  locations  for  the  other 
images  were  then  calculated  by  using  the  known  light  field  slope  obtained  from  the 
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Figure  38.  SIFT  Localization  Error.  Corresponds  to  the  case  pictured  in  Fig.  37a.  The 
shape  of  the  error  distribution  remains  largely  constant  across  the  range  of  subaperture 
images.  The  distribution  is  well  approximated  by  a  normal  fit.  The  fit  shown  in  (b) 
has  a  coefficient  of  determination,  r2,  of  0.96. 

ground  truth  depth  map.  Further  analysis  is  necessary  to  determine  if  the  slight 
deviation  from  the  distribution  and  its  normal  fit  shown  in  the  figure  is  sufficient  to 
account  for  the  failure  of  Fig.  37a  to  match  with  theory. 

In  Fig.  37a,  angular  samples  are  added  by  increasing  the  simulated  camera  diam¬ 
eter  D  and  decreasing  the  detector  size  A q  in  such  a  way  that  the  factor  7  =  m/m 
remains  constant.  Fig.  37b  shows  the  cases  where  angular  resolution  is  added  in  ac¬ 
cordance  with  the  two  other  schemes  in  Fig.  27.  Since  these  tradeoffs  do  not  maintain 
a  constant  7,  the  sampled  light  field  slope  error  and  continuous  light  field  slope  error 
are  shown  separately. 

Though  all  three  cases  show  a  fall-off  in  sampled  light  field  slope  error  with  in¬ 
creasing  number  of  angular  samples,  the  dependence  falls  short  of  that  described  in 
equation  83.  This,  in  turn,  leads  to  unexpected  behavior  for  the  continuous  light 
field  slope  error.  For  example,  though  changing  the  detector  size  to  increase  Nu  while 
keeping  D  constant  should  lead  to  a  decreased  uncertainty  according  to  Eq.  84,  the 
behavior  in  Fig.  37b  is  constant  with  Nu. 


78 


4.4  Range  Finding  using  Epipolar  Plane  Images 


The  feature  matching  framework  is  useful  for  a  number  of  reasons.  Feature  match¬ 
ing  using  SIFT-like  algorithms  is  a  very  commonly  employed  technique  for  determin¬ 
ing  structure  from  imagery  within  the  computer  vision  community.  Therefore,  depth 
estimation  using  SIFT  represents  an  obvious  first  approach  to  the  plcnoptic  ranging 
problem.  A  second  advantage  of  feature  matching  is  the  straightforward  uncertainty 
analysis  available  via  the  simple  linear  regression  estimator.  Finally,  as  discussed  in 
the  previous  section,  when  registration  between  images  is  linked  to  an  entity  having 
an  existence  within  the  scene  itself  (namely,  a  feature),  this  entity  can  be  localized 
to  within  subpixel  precision,  allowing  for  highly  accurate  depth  estimates. 

For  these  reasons,  it  makes  sense  to  employ  feature  matching  as  a  first  look  at 
plcnoptic  rangefinding.  However,  the  high  dimensionality  of  the  light  field  also  allows 
for  other  more  direct  methods  which,  in  their  simplicity,  afford  considerable  advan¬ 
tages  over  the  use  of  SIFT.  These  methods  operate  directly  on  either  the  light  field 
itself  or  on  Epipolar  Plane  Images  (See  section  3.4). 

Since  a  point  source  must  appear  as  a  sloped  line  within  an  EPI,  one  simple 
approach  is  to  search  for  this  line  by  looking  for  slopes  along  which  the  EPI  has  low 
variance  or  photo-consistency.  This  can  be  thought  of  as  the  equivalent  of  image 
registration  through  correlation,  applied  to  the  light  field. 

Slope  estimation  using  Light  Field  Photo-consistency. 

Fig.  39  shows  an  Epipolar  Plane  Image  (EPI).  As  discussed  in  section  3.2,  the 
EPI  is  composed  of  sloped  lines,  each  of  which  maps  to  a  single  point  within  the 
scene  corresponding  to  the  light  field,  such  that  the  slope  of  the  line  relates  to  the 
distance  to  the  point.  Now,  imagine  calculating  the  variance  of  an  EPI  along  each  of 
its  vertical  columns.  In  areas  containing  vertical  lines,  the  values  contained  within  a 
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Figure  39.  Calculation  of  Photo-Consistency.  The  figure  shows  a  portion  of  an  EPI  un¬ 
der  varying  degrees  of  shearing.  The  variance  along  the  dotted  white  line  is  minimized 
with  the  shearing  slope  matches  the  slope  of  the  lines  in  the  light  field. 

single  column  would  stay  consistent,  leading  to  a  low  variance.  On  the  other  hand,  in 
areas  where  a  column  is  crossed  by  multiple  slanted  lines,  the  variance  will  be  higher. 

This  suggests  the  approach  of  estimating  slope  by  shearing  the  light  held  by  dif¬ 
ferent  amounts,  and  looking  for  vertical  lines  identified  by  low  variance  at  each  degree 
of  shearing.  Following  this  approach  for  a  given  light  held  slice  will  result  in  an  Ns 
by  Nm  matrix  of  variance  values,  where  Ns  is  the  width  of  the  slice  and  Nm  is  the 
number  of  slopes  used  for  shearing  the  EPI. 

This  matrix  can  be  visualized  as  a  disparity  space  image,  or  DSI,  as  in  [7].  Fig. 
40  shows  a  DSI  and  depth  map  built  from  the  same  light  held.  Each  row  of  the  image 
corresponds  to  a  different  degree  of  shearing  of  the  light  held  slice.  The  value  of 
each  pixel  gives  the  variance  calculated  from  the  vertical  columns  within  the  sheared 
slice.  To  avoid  confusion  with  the  concept  of  variance  within  the  context  of  random 
variables,  the  term  photo-consistency  is  henceforth  used  in  place  of  variance  for  this 
value. 

Close  observation  reveals  that  the  DSIs  in  Fig.  40  (b)  and  (c)  are  composed  of  a 
fundamental  unit  shaped  something  like  a  bundle  of  lines  passing  through  a  central 
minimum.  The  bundles  are  especially  prominent  in  two  locations,  corresponding  to 
the  edges  of  the  ball  seen  in  the  depth  map  shown  in  the  figure.  These  bundles  are 
known  as  DSI  shadows,  since  they  seem  to  shadow  certain  points  on  the  DSI.  The 
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Figure  40.  Depth  Estimation  Using  Photo-Consistency,  (a)  shows  a  slope  map  gen¬ 
erated  using  the  photo-consistency  technique,  (b)  shows  a  DSI  corresponding  to  the 
slice  of  (a)  outlined  in  blue.  Note  the  shadows  located  at  the  edges  of  the  ball,  which 
cover  a  small  portion  of  the  line  of  minimum  values  on  each  side,  (c)  gives  the  same 
DSI,  but  generated  using  only  the  lower  angular  half  of  the  sheared  light  field.  This 
causes  the  shadows  to  tilt  inward  above,  allowing  for  a  better  estimate  of  the  occluded 
region,  (d)  and  (e)  show  how  the  error  changes  when  this  extra  measure  is  taken  in 
regions  where  occlusions  are  detected.  The  improvement  is  most  visible  in  the  upper 
left  corner  of  the  ball. 
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Figure  41.  DSI  Shadow.  Each  point  on  a  DSI  maps  to  a  line  in  the  Light  Field  along 
with  the  variance  is  taken  to  provide  the  value  of  the  DSI  point.  The  shadow  of  a 
DSI  point  of  interest  is  composed  of  all  those  points  whose  corresponding  lines  on  the 
light  field  overlap  the  line  represented  by  the  point  of  interest.  The  figure  illustrates 
that  there  will  be  a  range  of  slopes  outside  of  this  shadow,  which  increases  with  spatial 
separation  s. 


shadowed  points  are  those  corresponding  to  the  actual  slope  of  the  light  field  (vertical 
position  on  DSI)  at  a  given  location  (horizontal  DSI  position).  These  appear  as 
minima  within  the  DSI  column.  The  shadow  of  a  DSI  minimum  consists  of  all  those 
points  whose  corresponding  lines  on  the  light  held  cross  the  line  corresponding  to  the 
DSI  minimum.  The  shadow  will  be  prominent  when  the  light  held  contains  an  edge 
or  strong  gradient,  as  in  the  two  points  in  the  figure. 

Fig.  41  shows  the  range  of  sloped  lines  fh  G  (m2,  m3)  which  will  not  intersect  with 
a  line  at  slope  fh\  located  a  distance  s  away.  Solving  the  simple  system  yields 


m2, 3  =  mi  T  2 s/Nq.  (91) 

The  shadow  region  consists  of  those  lines  outside  of  the  range  (m2, m3),  or  those 
satisfying 

\fh  —  fhi\  >  2-^-  (92) 

This  equation  defines  two  fingers  which  meet  at  fh\  where  s  —  0,  and  recede  toward 
the  DSI  edges  as  s  increases.  This  is  the  shape  seen  in  Fig.  40. 

DSI  shadows  can  have  unwanted  effects  near  object  edges.  In  this  context,  the 
DSI  shadow  represents  the  effect  of  one  object  occluding  another  over  a  range  of 
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subaperture  images.  Notice  in  the  first  DSI  in  Fig.  40  that  the  line  of  minimum 
points  on  the  left  of  the  image  seems  to  disappear  behind  the  shadow  corresponding 
to  the  edge  of  the  ball.  Various  sophisticated  approaches  are  possible  for  dealing 
with  such  cases.  The  approach  employed  here  is  a  modification  of  that  described  in 
[7],  wherein  successively  distant  ‘tubes’  of  equidistant  portions  of  the  light  field  are 
extracted  prior  to  recalculating  the  DSI  in  an  iterative  process.  In  the  context  of 
a  plenoptic  camera,  the  extent  of  occlusions  is  generally  more  limited  than  for  the 
EPIs  in  [7],  which  were  collected  by  a  track  mounted  camera.  Even  in  the  presence 
of  occlusions,  we  can  normally  count  on  having  at  least  one  half  of  the  light  held 
occlusion  free.  Therefore,  a  possible  approach  is  to  1)  detect  occlusions,  2)  determine 
the  ‘directionality’  of  the  occlusion,  and  3)  use  a  precalculated  DSI  generated  from 
the  occlusion-free  half  of  the  light  held  to  estimate  slope. 

A  region  is  considered  to  be  occluded  if  the  variance  generated  from  a  given  half 
of  the  EPI  is  less  than  the  variance  generated  from  the  full  EPI  by  a  specihed  factor, 
and  if  the  location  of  the  two  minima  are  separated  by  more  than  a  specihed  offset. 
A  factor  of  10  and  a  0.1  offset  threshold  yielded  good  results. 

The  second  DSI  in  Fig.  40  was  generated  by  using  the  bottom  half  of  the  DSI. 
Notice  how  the  DSI  shadows  bend  inward,  exposing  minima  which  were  previously 
covered.  The  two  error  maps  in  the  figure  show  how  this  correction  leads  to  reduced 
error  at  object  boundaries,  although  the  degree  of  the  improvement  is  variable. 

Though  DSI  shadows  can  cause  problems  at  object  boundaries,  in  general,  the 
shadow  helps  ‘frame’  the  DSI  minimum,  ensuring  that  it  will  be  easily  detected.  This 
is  especially  true  when  the  image  gradient  is  high.  In  general,  we  expect  that  slope 
estimation  performance  will  be  dependent  on  gradient  scale,  especially  in  the  presence 
of  noise. 
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Sampled  Light  Field  Slope  Uncertainty. 


To  formalize  these  observations,  we  consider  the  case  of  a  light  held  slice  composed 
of  a  horizontal  gradient  with  additive  Gaussian  noise, 

L(s,  u)  —  gs  +  n(s,  u)  (93) 

where  n  ~  N(0,cr2)  is  a  zero  mean,  normally  distributed  random  variable  having 
variance  a2,  and  g  is  the  spatial  gradient  (gradient  within  a  subaperture  image): 

(1L 
9  ~  ds 

For  the  sampled  light  held,  L(s,u )  =  gs  +  n,  we  assume  that  the  gradient  is  scaled 
according  to  the  simple  relation, 


g  =  —As  =  —AqNu,  (95) 

although  we  will  later  see  that  this  assumption  is  not  completely  valid,  and  that  a 
robust  treatment  of  gradient  scaling  under  the  effect  of  sampling  is  a  difficult  problem 
worthy  of  greater  attention.  Next,  we  define  a  sloped  slice  of  the  sampled  light  held, 
Sm(u),  as 

Sm(u)  —  L(fhu,  u),  u  G  [~(NU  -  l)/2,  (Nu  -  l)/2] .  (96) 

A  method  of  interpolation,  discussed  in  more  detail  later,  is  used  to  provide  values 
of  L  at  non-integer  values  of  s  —  mu.  We  want  the  photo-consistency  of  this  slice, 
which  we  define  as 

Vfn  =  var(5'm)  =  var  (gifiu))  +  var(n)  =  var  (gmu)  +  a2,  (97) 
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which  follows  because  variance  is  a  linear  operator  when  the  variables  being  summed 
are  uncorrelated.  The  s  subscript  is  added  to  cr  to  annotate  that  this  is  the  sample 
variance,  and  not  the  variance  of  the  normal  distribution  used  to  define  the  ran¬ 
dom  variable,  n.  This  will  become  of  importance  later  on.  The  variance  of  gfhu  is 
calculated  as 

2  (JV„-i)/2  2 fh2g2  ( Nu~ b/2 

var (gfhu)  =  —  —  -  ^  (gfhu-  <  gfhu  >)2  =  —  —  -  ^  vi2  (98) 

“  -(Nu~  l)/2  “  «=1 

which  results  because  the  mean  value  <  u  >=  0.  Since  Y^c=i  x2  =  n(n  +  l)(2n+  l)/6 
[30],  it  follows  that 

ffi2a2 

var  (gfhu)  =  -^-NU(NU  +  1)  (99) 

and 

ff)  2 

K-n  =  -^|-^„(^1  +  1)  +  CT2.  (100) 

Where  cr  =  0  this  equation  defines  a  parabola  in  m,  where  the  concavity  of  the 
parabola  increases  with  the  number  of  angular  slices  Nu  and  decreases  as  the  gradient 
becomes  shallower.  Fig.  42  compares  two  parabolas  generated  using  Eq.  100  and 
by  actually  shearing  a  gradient  image  and  calculating  photo-consistency.  The  two 
parabolas  are  visually  identical. 

When  cr  >  0,  the  parabola  will  be  affected  by  noise  related  to  variation  in  the 
sample  variance  of  n,  of: 


var  (Vfh)  =  var 


m2g2 

12 


NU(NU  +  1)  I  +  var(<72)  =  va r(u2) 


(101) 
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Figure  42.  Photo-Consistency  of  Gradient  without  Noise.  The  plots  were  generated 
using  Eq.  100  and  through  direct  simulation.  The  plots  are  visually  identical. 


where  the  sample  variance,  a2  (equivalently  understood  as  the  photo-consistency  of 
the  noise  component  of  the  light  held),  expands  to 


{rii  -  y)2 

We  approximate  that  the  sample  mean,  //.  is  equal  to  zero.  Since  var(a:r  +  by)  = 
a2var(;r)  +  62var(t/),  where  a  and  b  are  constants  and  x  and  y  are  uncorrelated  random 
variables,  the  variance  operator  can  be  brought  inside  of  the  summation: 


Nu 


var(cr2)  =  var 


N,. 


E 

i= 1 


var 


Nu 


(nu  - 1)2  tr 


J^var(n2)  = 


(nu  -  iy 


-var  n 


(103) 


where 


n2  f(n)dn  =  3a4 


(104) 


and  f[n)  =  iV(0,  a)  is  the  probability  density  function  for  n.  The  integral  is  solved 
via  integration  by  parts.  Combining  Eqs.  101,  103,  and  104  gives  the  variance  of  the 
photo-consistency, 


Nu 

(Nu  -  l)2 
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3cr4 


var  (Vn) 


(105) 
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Figure  43.  Photo-Consistency  of  Gradient  with  Noise.  Plots  were  generated  using  Eq. 
106  with  increasing  amounts  of  Gaussian  noise. 


This  allows  us  to  modify  Eq.  100  by  expanding  the  sample  variance,  as  in 


V-  = 

•  m, 


m2g2 

12 


NU(NU  +  1)  +  u2  +  P 


(106) 


where  p  ~  1V(0,  cr2)  is  a  zero  mean,  normally  distributed  random  variable  with  stan¬ 
dard  deviation  ap  =  \/3 /Nucr2. 

Fig.  43,  shows  a  range  of  DSI  slices  generated  using  Eq.  106.  Though,  as  will  be 
shown  below,  this  is  not  an  accurate  representation  of  the  results  of  shearing  a  noisy 
gradient,  it  does  provide  a  simple  framework  for  thinking  about  the  effects  of  noise  on 
slope  estimation.  From  the  figure,  it  is  clear  that,  in  the  presence  of  too  much  noise, 
the  minimum  of  the  DSI  may  no  longer  exist  at  the  vertex  of  the  parabola,  leading 
to  a  faulty  estimate. 

In  order  to  quantify  this  effect,  we  determine  how  far  the  slope  fh  must  change  in 
order  for  the  corresponding  change  in  Vm  to  equal  the  standard  deviation  ap  of  Vm- 


V Am  V 0  —  Op- 


(107) 
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Figure  44.  Simulated  Photo-Consistency  of  Gradient  with  Noise.  Plots  were  generated 
via  direct  simulation  using  noisy  gradients.  Deviation  from  the  noiseless  parabola  in 
Fig.  42  is  not  uncorrelated  with  changing  slope,  as  in  Fig.  43. 


Beyond  this  point,  we  consider  that  it  is  unlikely  for  noise  to  produce  a  false  minimum. 
The  equation  solved  by 


AmJ  /  12^ 

g\J  NU(NU  +  1) 


a  I  12  [ 3“_  4.5a 


(108) 


This  indicates  a  strong  dependence  in  ranging  uncertainty  on  the  intensity  of  noise, 
the  scale  of  any  gradient  features,  and  the  number  of  angular  samples. 

Fig.  44  shows  some  photo-consistency  curves  resulting  from  shearing  a  noisy 
gradient  image  over  a  range  of  slopes.  There  is  little  resemblance  to  the  plots  in  Fig. 
43.  The  reason  for  this  is  that  Eq.  106  is  derived  from  the  assumption  that  both 
g(s)  and  n(s,u )  exist  on  a  continuous  space  in  s.  This  is  to  allow  for  the  shearing 
operation,  since  the  angled  slice  Sm  =  L(mu,u )  calls  for  evaluation  of  L  at  arbitrary 
real  values  of  s.  In  reality,  L(s,u )  is  an  image  having  a  finite  number  of  samples, 
and  interpolation  is  required  to  shear  at  arbitrary  slopes.  The  end  result  is  that  the 
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Figure  45.  Photo-Consistency  of  Gaussian  Noise.  Plots  were  generated  using  varying 
amounts  of  Gaussian  noise,  with  no  gradient.  Using  linear  interpolation  to  perforin 
shearing  results  in  cusps  at  certain  points,  particularly  m  =  0,  where  the  effective  size 
of  the  interpolation  kernel  goes  to  unity.  Using  an  interpolation  kernel  that  maintains 
a  more  constant  size  helps  eliminate  these  cusps. 


variance  of  the  photo-consistency  of  any  given  slice  of  an  image  will  obey  Eq.  106 
across  a  set  of  images  having  different  noise  content.  However,  in  a  particular  image, 
the  photo-consistency  of  a  slice  at  one  slope  will  not  be  uncorrelated  from  the  photo¬ 
consistency  at  nearby  slopes.  This  explains  why  the  plots  in  Fig.  44  show  deviation 
from  the  ideal  parabola,  but  not  in  the  quickly  varying  manner  of  Fig.  43. 

A  second  noteworthy  feature  of  the  plots  in  Fig.  44  is  the  cusp  appearing  at  rh  =  0 
in  the  two  rightmost  plots.  This  cusp  can  be  seen  more  clearly  in  Fig.  45,  where  the 
photo-consistency  of  only  the  noise  component  is  plotted.  The  central  maximum 
for  the  curves  generated  using  linear  interpolation  is  explained  by  the  fact  that  at 
rh  =  0,  no  interpolation  is  needed.  Since  interpolation  has  the  effect  of  convolving 
the  noise  and  thus  reducing  its  variance,  the  absence  of  interpolation  at  rh  =  0  leads 
to  a  heightened  variance  compared  to  points  where  interpolation  is  employed.  The 
other  cusps  are  likely  located  at  points  where  the  need  for  interpolation  is  minimal. 
In  order  to  ameliorate  this  effect,  we  employ  a  method  of  interpolation  which  uses 
a  convolution  kernel  that  maintains  a  constant  size.  The  figure  illustrates  how  this 
is  effective  in  removing  the  cusps,  although  it  also  has  the  effect,  mostly  benign,  of 
reducing  the  variance  everywhere  by  some  factor. 


Figure  46.  Estimation  Uncertainty:  Standard  deviation  of  slope  estimate  from  true 
value  as  a  function  of  Gradient  Scale  ( Ng )  and  Number  of  Angular  Samples  ( Nu ),  at 
two  different  noise  levels,  (a)  and  (c)  give  simulation  results,  while  (b)  and  (d)  give 
the  results  predicted  by  Eq.  108.  The  analytic  result  is  not  formulated  as  a  standard 
deviation,  but  it  scales  in  the  same  way.  The  values  in  (b)  and  (d)  in  this  figure  and  in 
Fig  47  haven  been  divided  by  a  factor  of  2.5  in  order  to  make  comparison  easier. 


In  order  to  verify  slope  uncertainty  defined  by  Eq.  108,  simulations  were  performed 
by  shearing  a  gradient  of  variable  scale,  having  a  variable  number  of  angular  slices, 
and  subject  to  Gaussian  noise  of  varying  standard  deviation. 

Figs.  46  and  47  illustrate  that  the  relationship  in  Eq.  108  remains  valid,  despite 
the  differences  between  the  assumptions  made  in  its  derivation  and  the  actual  behavior 
of  the  sheared,  noisy  gradient  illustrated  in  Fig.  45.  The  figures  illustrate  that, 
although  the  number  of  angular  samples  does  not  appear  to  be  highly  important  at 
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Figure  47.  Estimation  Uncertainty  (cont.):  Standard  deviation  of  slope  estimate  from 
true  value,  (a)  and  (b)  give  analytic  and  simulation  results,  respectively,  as  the  number 
of  angular  samples  ( Nu )  and  noise  (a)  are  varied,  and  with  a  gradient  scale,  Ng  =  5.  (c) 
and  (d)  give  the  results  at  Nu  =  9,  while  varying  noise  and  gradient  scale. 
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low  noise  conditions,  at  higher  noise  levels,  an  increased  number  of  angular  samples 
results  in  much  better  accuracy. 


Continuous  Light  Field  Slope  Analysis. 


The  analysis  of  the  previous  section  focuses  on  uncertainty  in  estimating  the  slope 
of  the  sampled  light  field.  Though  this  analysis  is  valid  within  that  context,  it  is 
insufficient  to  assess  trade-offs  in  plcnoptic  camera  construction  since  it  ignores  the 
way  that  the  sampled  light  field  slope  itself  is  affected  by  the  changes  in  sampling 
characteristics  brought  about  by  changing  camera  parameters.  In  this  section,  those 
effects  are  considered. 

The  sampled  light  field  slope  and  the  continuous  light  field  slope  are  related  by 
the  ratio  7,  as  in 


A  u  D/Nu  D  1 

m  =  my  =  m— —  =  mn— — —  =  m— — — . 
'  As  NuAq  A q  Nl 


(109) 


We  assume  that  a  sampled  spatial  gradient  is  scaled  by  the  sampling  interval,  As,  as 


m 


dL  dL 
9  =  ~d^A'S  =  lb 

Upon  making  these  substitutions  into  the  photo-consistency  equation,  Eq.  106,  the 
continuous-slope  photo-consistency,  Vm,  is  given  by 


A  qNu 


(110) 


g2m2D2  2 

Vm  =  - — -  +  O'  +  P. 


(Ill) 


The  slope  uncertainty,  Am  is  then  given  by 


Am  ~ 


4.5  <7 
Nuf 4  9d' 


(112) 
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The  most  important  difference  between  this  equation  and  Eq.  108  is  the  dependence 
on  Nu.  If  the  main  lens  diameter  D  is  held  constant,  the  improvement  gained  by 
adding  angular  samples  is  only  the  small  Nu  1//4  dependence  related  to  an  improved 
signal  to  noise  ratio.  The  equation  implies  that  the  pixel  size  A q  is  not  directly  related 
to  uncertainty.  Thus,  the  impact  of  increasing  Nu  by  expanding  the  microlens  size 
As  or  by  decreasing  the  pixel  size  A q  should  be  similar.  In  the  next  section,  these 
dependencies  are  verified  experimentally  within  the  context  of  a  synthetic  light  field. 

Experimental  Results. 

Fig.  48  shows  depth  maps  generated  using  the  photo-consistency  technique  under 
differing  noise  conditions,  along  with  associated  DSI  images  for  a  portion  of  the  scene. 
Notice  that,  as  the  noise  level  increases,  the  DSI  minima  become  less  distinct  until 
reaching  a  point  where  they  are  difficult  to  identify.  This  leads  to  the  noisy  behavior 
seen  in  the  depth  maps  themselves. 

Fig.  49  shows  the  photo-consistency  as  noise  increases  in  a  non-logarithmic  scale 
for  three  different  points  selected  from  the  scene  in  Fig.  48.  In  all  three  cases,  the  plot 
starts  out  having  a  parabolic  shape.  This  validates  the  previous  section’s  prediction 
that  photo-consistency  curves  should  take  on  the  shape  of  a  parabola  in  the  vicinity 
of  a  minimum.  As  the  amount  of  image  noise  increases,  the  photo-consistency  curve 
itself  becomes  noisy,  leading  to  detection  of  false  minima.  Where  the  image  gradient 
is  stronger  (leftmost  plot),  the  parabola  is  steeper,  making  it  harder  for  variations 
due  to  image  noise  to  create  a  false  minimum  of  significant  magnitude.  This  is  why, 
in  Fig.  48,  even  under  large  noise  conditions,  strong  edges  like  those  of  the  die  and 
plank  remain  well  resolved. 
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Figure  48.  Slope  Maps  From  Noisy  Light  Fields.  As  noise  increases,  the  DSI  minimum 
becomes  increasingly  poorly  defined,  leading  to  the  noisy  slope  estimates  seen  in  the 
maps. 
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Figure  49.  Photo-Consistency  with  Noise.  Plots  are  associated  with  the  indicated 
points  of  the  light  field  in  Fig.  48  (a).  The  plots  grow  more  erratic  as  noise  increases. 
Where  the  image  gradient  is  higher,  photo-consistency  minimums  stay  more  localized 
under  the  effects  of  noise. 


Simulated  Camera  Analysis:  Varying  Lens  Diameter. 

This  chapter’s  introduction  outlines  a  number  of  ways  in  which  the  synthetic  light 
fields  provided  by  HCI  can  be  resampled  to  simulate  changes  to  the  plenoptic  camera 
configuration.  The  simplest  of  these  is  illustrated  in  the  first  column  of  Fig.  27.  This 
corresponds  to  increasing  the  camera  lens  diameter  while  decreasing  the  detector  sizes. 
The  transformation  increases  camera  lens  diameter  while  decreasing  the  detector  sizes 
in  a  manner  that  maintains  the  ratio  7  =  Au/ As.  Since  7  relates  the  continuous  and 
sampled  light  field  slopes,  keeping  7  constant  means  that  the  behavior  of  arn  will 
mimic  that  of  a m  under  changes  in  Nu  brought  about  by  this  transformation. 

Fig.  50  shows  <7^  under  the  impact  of  varying  noise  and  changing  number  of 
angular  samples  in  the  manner  described  above.  According  to  the  model  derived 
above,  uncertainty  should  grow  linearly  with  noise  and  fall  off  with  1/A^4.  However, 
performing  an  exponential  fit  to  the  results  shows  that  the  growth  with  noise  is  closer 
to  ydx  and  the  fall-off  under  changing  Nu  closer  to  1  / y/N^. 

A  clue  to  the  reason  for  these  discrepancies  is  found  within  the  slope  maps  in  Fig. 
48.  Certain  regions  of  the  map  quickly  display  estimation  noise  having  an  amplitude 
that  spans  the  entire  possible  range  (from  the  smallest  possible  to  greatest  possible 
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Figure  50.  Experimental  Slope  Uncertainty,  Varying  Lens  Diameter:  Effects  of  increas¬ 
ing  the  number  of  angular  samples  while  holding  As  and  An  constant.  This  requires 
simultaneously  increasing  the  lens  diameter  and  decreasing  the  detector  size.  The  expo¬ 
nential  fit  in  (d)  follows  N~0A8a~0  56 ,  rather  than  the  expected  N~125a~1 .  Discrepancies 
with  theory  are  likely  due  to  the  faulty  assumption  of  ’infinite’  gradients. 
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slope).  Adding  more  noise  to  the  light  held  therefore  makes  no  difference  in  the 
estimation  noise  seen  in  the  depth  map  for  such  regions. 

This  observation  is  explained  in  the  following  manner:  in  developing  the  analytic 
model,  we  treat  the  gradients  involved  as  having  indefinite  extent.  Throughout  the 
section,  photo-consistency  plots  are  shown  as  parabolas  of  indefinite  extent,  corre¬ 
sponding  to  the  assumption  of  an  indefinitely  extensive  gradient.  However,  in  a  true 
scene  or  image,  most  gradients  exist  as  part  of  edges  or  patterns  of  textures.  For 
such  cases,  as  the  shearing  slope  moves  away  from  actual  light  held  slope,  the  photo¬ 
consistency  will  typically  reach  a  noise  hoor  at  which  parabolic  behavior  ceases.  Once 
the  variation  in  the  photo-consistency  induced  by  light  held  noise  reaches  a  magni¬ 
tude  exceeding  this  hoor,  the  photo-consistency  minimum  is  liable  to  jump  outside  of 
the  parabolic  region,  resulting  in  an  error  which  spans  the  entire  possible  range. 

It  follows  that,  as  noise  levels  increase,  a  large  component  of  the  increase  in  error  is 
due  to  additional  samples  entering  this  regime  of  where  error  is  more  or  less  uniformly 
distributed  across  the  entire  possible  range.  At  some  point,  a  saturation-like  behavior 
must  take  place  as  the  number  of  locations  displaying  this  behavior  approaches  the 
total  number  of  locations,  and  the  number  of  opportunities  for  new  instances  of  the 
behavior  to  come  about  diminishes.  This  likely  explains  why,  in  Fig.  50,  uncertainty 
appears  to  grow  linearly  with  noise  for  a  while  before  reaching  a  point  where  growth 
diminishes. 

Fig.  51  illustrates  that,  though  adding  additional  angular  samples  increases  the 
concavity  of  the  parabola  within  the  parabolic  region  of  the  photo-consistency  curve, 
it  does  not  significantly  impact  height  of  the  floor  surrounding  the  parabolic  region. 
This  explains  why  the  fall-off  in  Nu  is  less  than  expected:  once  a  location  begins  to 
display  unbounded  behavior,  increasing  Nu  does  little  to  improve  estimation  accuracy. 
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Figure  51.  The  Effect  of  Changing  Number  of  Subapertures  on  Photo-Consistency. 
Increasing  the  number  of  subapertures  causes  the  parabolic  region  of  the  curve  to 
become  tighter.  But  the  floor  outside  of  the  parabolic  region  actually  drops. 

Simulated  Camera  Analysis:  Varying  Detector  Size. 

Resampling  the  light  held  according  to  the  second  scheme  in  Fig.  27  corresponds  to 
changing  the  size  of  the  detector  elements  while  keeping  all  other  parameters  constant. 
The  effect  of  this  variation  is  shown  in  Fig.  52.  The  continuous  light  held  slope  error 
appears  to  decrease  linearly  as  Nu  increases.  However,  when  an  exponential  ht  is 
performed,  the  falloff  comes  close  to  the  predicted  (1  /A/)0'25  dependence,  with  the 
best  ht  falling  oh  with  (1  /NU)0A7 .  Similarly,  the  sampled  light  held  slope  error  falls 
of  as  (1/AU)115  compared  to  the  theoretical  prediction  of  [1/Nu)l:2b. 

Simulated  Camera  Analysis:  Spatial/ Angular  Trade-off. 

The  effect  of  changing  the  microlens  size  As  is  of  particular  interest  because  it 
induces  a  tradeoff  between  angular  and  spatial  sampling  density,  as  seen  in  the  third 
scheme  of  Fig.  27. 

Fig.  53  compares  DSI  images  generated  from  light  helds  within  the  tradespace  at 
Nu  =  3  and  Nu  =  9,  subjected  to  Gaussian  noise.  Though  the  images  appear  to  be 
impacted  differently  by  the  light  held  noise,  it  isn’t  clear  that  one  better  highlights  the 
minima  associated  with  correct  slope.  The  second  and  fourth  images  show  banding 
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Figure  52.  Experimental  Slope  Uncertainty,  Varying  Detector  Size:  Effects  of  increas¬ 
ing  the  number  of  angular  samples  Nu  by  decreasing  detector  size,  (a)  through  (c) 
show  continuous  light  field  slope  error,  while  (d)  shows  sampled  light  field  slope  error. 
The  best  fit  for  (c)  follows  iV“115,  while  the  fit  to  (d)  follows  N~0-17 ,  in  good  agreement 
with  the  theory. 
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Figure  53.  Comparison  of  DSIs  Generated  from  Noisy  EPIs.  (a)  and  (b)  use  a  light 
field  having  three  angular  samples,  while  (c)  and  (d),  have  nine  angular  samples,  and 
correspondingly  lower  spatial  resolution,  (a)  and  (c)  use  a  Gaussian  interpolation  kernel 
with  a  1  pixel  standard  deviation  and  having  a  constant  size  of  three  pixels,  while  (b) 
and  (d)  use  linear  interpolation.  Where  linear  interpolation  is  used,  cusp  artifacts  are 
apparent  as  horizontal  stripes. 
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Figure  54.  Experimental  Slope  Uncertainty,  Varying  Microlens  Size:  Effects  of  angu¬ 
lar/spatial  resolution  trade-off  achieved  by  varying  microlens  size  on  slope  uncertainty 
under  presence  of  noise.  The  increase  in  performance  with  angular  resolution  is  mini¬ 
mal. 

artifacts  associated  with  linear  interpolation,  which  are  suppressed  by  the  use  of  a 
Gaussian  interpolation  kernel  rather  than  linear  interpolation,  as  discussed  above. 

The  estimation  performance  is  quantified  in  Fig.  54,  which  shows  the  error  under 
different  noise  conditions  as  As  is  altered  (effecting  a  change  in  Nu).  The  improvement 
in  performance  with  Nu  in  this  case  is  hardly  noteworthy.  To  understand  why  this  is 
the  case,  it  is  necessary  to  start  by  assessing  the  agreement  between  the  sampled  light 
field  slope  error  and  Eq.  108.  Next,  the  transformation  between  continuous  light  field 
quantities  and  sampled  quantities  must  be  examined. 

Fig.  55  shows  the  sampled  light  field  slope  error  as  a  function  of  image  gradient,  g , 
and  number  of  angular  samples,  Nu.  Viewing  uncertainty  in  terms  of  image  gradient  is 
necessary  in  this  case,  since  image  gradient  is  not  expected  to  remain  constant  under 
the  spatial  resampling  involved  in  altering  microlens  size.  The  best  fit  to  the  data  is 
noteworthy  because  the  dependence  on  Nu  is  a  more  dramatic  N~2  than  the  expected 
Nu5/l  falloff.  This  is  somewhat  surprising  considering  the  fact  that  the  continuous 
light  field  slope  error  shows  almost  no  dependence  on  Nu.  The  stronger  dependence 
than  expected  is  slightly  disconcerting  and  hints  at  some  form  of  systematic  error. 
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Figure  55.  Experimental  Slope  Uncertainty,  Varying  Microlens  Size  (cont.):  (a)  shows 
the  sampled  light  field  slope  estimation  error  as  a  function  of  image  gradient  and 
number  of  angular  samples  for  noise  level  a  =  0.4.  The  exponential  fit,  shown  in  (b) 
follows  N~2a~0'37 .  Though  the  dependence  on  Nu  is  greater  than  expected,  it  does  not 
result  in  a  strong  dependence  for  the  continuous  slope  error  in  Fig.  54.  (c)  shows  the 
distribution  of  the  image  gradient  as  As  increases.  The  distribution  shifts  slowly  to  the 
right,  but  not  nearly  to  the  extent  expected  according  to  the  simple  scaling  assumption 
that  g  =  gAs,  as  shown  in  (d). 
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The  explanation  possibly  involves  the  uniform  estimation  error  behavior  discussed 
above,  wherein  the  average  error  becomes  coupled  to  the  separation  of  the  bounds 
of  the  shearing  slope  used  to  generate  the  DSI.  If  the  same  continuous  slope  range 
is  used,  this  separation  does  diminish  with  a  1/iV^  dependence  for  the  sampled  light 
field  slope. 

That  the  strong  dependence  on  Nu  seen  here  does  not  result  in  a  stronger  depen¬ 
dence  in  Fig.  54  is  due  to  two  discrepancies  between  the  gradient-related  behavior 
assumed  in  the  development  of  Eq.  112  and  the  behavior  observed  in  Fig.  55.  Most 
importantly,  the  falloff  of  error  with  image  gradient  displays  a  relationship  closer  to 
i?-1/3  than  the  expected  g _1. 

A  second  discrepancy  worth  noting  involves  the  scaling  of  image  gradients  under 
the  impact  of  spatial  resampling.  In  the  derivation  of  Eq.  112,  it  was  assumed  that 
continuous  gradients  would  relate  to  sampled  gradients  according  to  g  —  gAs.  Fig. 
55  shows  the  distribution  of  gradients  as  resampling  takes  place  to  simulate  altering 
the  microlens  size.  As  As  increases,  the  distribution  gradually  shifts  to  the  right. 
However,  under  the  assumption  that  gi/g2  =  Asi/As2  =  a,  we  note  that 

f{92)dg2  =  -f  (—)  dgi.  (113) 

Fig.  55  shows  a  fit  of  the  distribution  at  Nu  =  3  to  the  distribution  at  Nu  =  9  using 
the  transformation  shown  here.  For  the  best  fit,  a  =  1.3.  However,  the  expected 
value  for  a  over  this  range  is  3.  The  figure  also  shows  what  the  distribution  would 
look  like  if  the  assumption  about  gradient  scaling  had  been  correct.  In  theory,  the 
updated  relation  g  ~  0.3gAs  should  lead  to  an  increased  uncertainty  for  any  type  of 
sampling,  without  affecting  the  dependence  on  Nu. 

However,  the  altered  fall-off  in  g  has  greater  ramifications.  To  show  this,  we  start 
with  an  empirical  model  which  reflects  the  altered  dependencies  seen  if  Fig.  55,  given 
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by 


(114) 


Am 


4.5a 

JVM3' 


Upon  substituting  Am  =  Am/7  and  S'  =  S' As/3,  the  equation  for  continuous  quan¬ 
tities  is 


Am 


6.5cr  Ag2/3 


(115) 


DNi,3g'/3' 

Due  to  the  altered  gradient  scaling,  the  iV“2  dependence  for  the  sampled  case  is 

1  /3 

reduced  to  a  very  mild  AV  dependence  for  the  continuous  case. 


4.5  Range  Finding  via  Refocusing 

Depth  through  Refocusing. 

The  ability  to  produce  refocused  imagery  is  one  of  the  most  striking  capabilities 
latent  in  the  light  field  captured  by  the  plcnoptic  camera.  When  properly  focused 
on  an  object  within  a  scene,  an  image  will  be  characterized  by  sharp  edges,  steep 
gradients,  and  comparatively  large  amounts  of  energy  in  high  spatial  frequencies. 
Thus,  a  natural  approach  to  range  finding  is  to  search  for  refocused  images  containing 
these  characteristics. 

In  the  spatial  domain,  this  means  producing  a  stack  of  refocused  images  and 
determining  in  which  frame  the  image  gradient  reaches  a  maximum  at  each  pixel. 
This  approach  bears  strong  similarity  to  the  photo-consistency  approach  discussed  in 
the  previous  section.  Indeed,  refocusing  involves  the  same  shearing  operation  used 
there  to  identify  the  slope  of  an  EPI.  While  the  photo-consistency  approach  looks 
for  low  variance  in  the  samples  that  will  be  summed  together  to  make  up  the  final 
image  pixel,  the  depth-through-refocusing  technique  first  performs  this  summation, 
and  then  looks  for  the  high  spatial  gradients  that  are  made  possible  when  accurate 
refocusing  minimizes  the  effective  point  spread  function  of  an  object  point.  When 
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Figure  56.  Point  Spread  Function,  ID.  When  image  formation  is  performed  via  pro¬ 
jection  of  the  light  field,  sloped  lines  within  the  light  field  (a)  will  be  spread  across  a 
range  of  image  pixels.  This  distribution,  which  can  be  envisioned  as  the  histogram  in 
(b),  determines  point  spread  functions  for  objects  at  this  depth,  visualized  in  (c). 


the  photo-consistency  (variance)  is  low,  this  is  because  the  samples  associated  with 
a  single  object  point  are  gathered  together  under  a  single  image  pixel,  rather  than 
spread  across  neighboring  pixels  where  they  would  reduce  spatial  image  gradients. 

In  order  to  consider  the  effects  of  defocus  on  imagery,  it  is  necessary  to  know  the 
defocus-induced  point  spread  function.  Fig.  56  illustrates  the  formation  of  the  PSF 
for  a  two  dimensional  slice  of  the  light  held.  A  defocused  point  is  represented  by  a 
sloped  line  within  the  light  held,  and  the  image  formation  operation  projects  the  line 
down  into  one  dimension.  The  projection  of  the  line  is  then  equal  to  the  point  spread 
function  for  an  object  at  the  distance  giving  a  line  of  that  slope.  The  projection 
operation  consists  of  counting  up  the  number  of  u  samples  associated  with  each  s 
sample.  This  same  approach  can  be  utilized  for  the  case  of  the  2D  PSF  generated 
from  the  4D  light  held,  as  illustrated  in  Fig.  57.  The  light  held  slope  dictates  the 
range  of  ( u ,  v )  samples  over  which  each  ( s,t )  sample  is  spread.  The  resulting  PSF  is 
a  disk  with  radius  r  =  mNuAu/2As  =  mD /2As. 
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Figure  57.  Point  Spread  Function.  The  projection  required  to  form  the  2D  PSF  is 
visualized  more  easily  as  counting  up  the  number  of  angular  samples  associated  with 
each  spatial  sample. 


Fourier  Domain  Ranging. 

Ranging  in  the  Fourier  domain  attempts  to  determine  the  planes  in  object  space 
which  contain  objects  by  searching  for  slices  of  the  Fourier  transformed  light  held 
which  contain  large  amounts  of  high  spatial  frequency  content.  In  general,  this  in¬ 
volves  weighting  spectral  intensities  by  some  increasing  function  of  spatial  frequency, 
and  then  summing  to  provide  an  image  sharpness  metric.  A  simple  linear  weighting 
was  found  to  yield  good  results. 

As  demonstrated  in  the  next  section,  this  method  can  provide  good  results  when 
there  is  only  one  object  in  the  scene.  However,  when  multiple  objects  having  different 
depths  exist  in  the  scene,  the  method  can  have  difficulty  distinguishing  them.  This  is 
because  the  sharpness  metric  curve  associated  with  a  single  object  can  be  very  broad, 
with  energy  slowly  receding  into  lower  spatial  frequency  regions  as  an  object  becomes 
out  of  focus.  These  curves  can  blend  together  or  overwhelm  one  another,  such  that 
planes  containing  objects  do  not  appear  as  local  maxima  in  the  curve. 

Fourier  domain  ranging  is  highly  sensitive  to  aliasing  and  other  Fourier  reconstruc¬ 
tion  artifacts.  When  simple  cubic  interpolation  is  used  to  slice  an  image  spectrum  for 
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the  transformed  light  field,  these  artifacts  manifest  as  high  spatial  frequency  content 
peaking  at  m  =  0,  and  dying  away  as  \m\  increases.  This  can  cause  the  approach  to 
fail  in  the  same  way  that  it  is  liable  to  fail  for  multiple  objects  at  different  depths. 
Reconstruction  using  a  Kaiser-Bessel  filter,  as  discussed  in  [11],  was  found  to  amelio¬ 
rate  this  effect.  However,  the  recommended  zero-padding  of  the  light  field  prior  to 
Fourier  transforming  was  observed  to  introduce  a  harmful  ‘ringing’  effect. 


Fourier  Ranging  Resolution. 

In  this  section,  we  develop  a  simple  model  for  estimating  the  depth  resolving 
capability  attainable  via  ranging  in  the  Fourier  domain.  Because  Fourier  domain 
ranging  is  a  global  operation,  it  is  easy  for  small  objects  to  be  overwhelmed  by 
more  dominant  objects  in  the  scene.  In  this  section,  we  address  the  cause  of  this 
phenomenon,  and  provide  a  depth  resolution  expression  for  two  objects  in  a  scene 
having  about  the  same  size  and  characteristics. 

To  start  out  generally,  we  assume  that  the  scene  consists  of  N\  delta  functions 
at  slope  mi  and  N2  delta  functions  at  m2.  Based  on  the  discussion  of  the  previous 
section,  the  refocused  image  will  be  given  by  a  depth  dependent  blurring  of  the  scene. 
For  simplicity,  we  replace  the  disc-shaped  Point  Spread  Function  of  the  previous 
section  with  a  Gaussian  kernel  having  a  =  rj 2,  given  by 


k(x,y) 


1 


exp 


(V  +  iQ 


(116) 


The  refocused  image  is  given  by  the  convolution  of  the  delta  functions  with  the 
appropriate  Gaussian  kernel. 


iVi  n2 

im(x,y )  =  ^5(x  -  x],y  -  y])  *  fa(x,y)  +  ^5(x  -  x^y  -  y*)  *k2(x,y).  (117) 

i= 1  i=  1 
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We  define  g(kx,ky )  =  FT2[im(x,  y)]  as  the  Fourier  transform  of  the  refocused  image. 
Via  the  convolution,  shift,  and  linearity  properties  of  the  Fourier  transform,  g(kx,  ky ) 
is  given  by 


g{kx,  ky)  =  ki(k 


h 


Nx  N2 

)  e~2^kx+yiky)  +  k2(kx,  ky)  ^  e-2™(xlk*+ylky) .  (118) 

i=  1 


Z_V 

i—  1 


The  summands  are  plane  waves  whose  frequency  and  direction  of  propagation  are 
determined  by  the  delta  function  locations  (xi,yi).  These  plane  waves  determine  the 
fabric  of  Fourier  space,  the  intensity  of  which  is  then  modulated  by  the  Fourier  trans¬ 
formed  Gaussian  kernels.  It  is  difficult  to  simplify  further  without  making  further 
assumptions  about  the  structure  of  the  image.  We  make  a  large  simplification  by 
assuming  that  for  our  purposes  the  distribution  in  Fourier  space  can  be  adequately 
represented  by  a  uniform  distribution  multiplied  by  a  weighting  factor  which  corre¬ 
sponds  to  the  overall  ‘prominence’  of  the  objects  at  each  slope  within  the  scene.  In 
reality,  this  means  reducing  the  initial  distribution  of  delta  functions  to  one  weighted 
delta  function  at  (x  —  0,y  —  0)  for  each  depth: 


g(kx,  ky)  =  Wiki(kx,  ky)  +  W2k2(kx,  ky).  (119) 

The  Fourier  transform  of  the  Gaussian  convolution  kernel  is  a  second  Gaussian  with 
inverted  variance: 

k  =  \[^ex P  (-2?rV( k2x  +  k2y))  .  (120) 

Image  sharpness  is  quantified  by  a  metric  which  measures  the  amount  of  energy  at 
high  spatial  frequencies  in  the  image.  Here,  the  g(kx,  ky)  is  multiplied  by  a  parabolic 
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weighting  function  and  then  summed: 


r 1/2  r 1/2 

metric  =  /  /  [k2x  +  ) | ry ( ,  ky)\dkxdky. 

7-1/2  .7  —  1/2 


(121) 


For  a  single  delta  function  with  a  weight  of  unity,  this  integral  may  be  solved  approx¬ 
imately  in  polar  coordinates,  as  in 


metric  = 


,1/2 


r2\g(r)\rdrd9 


(122) 


Via  integration  by  parts,  this  is  solved  by 


metric  = 


1 


[l  -  exp  (^) 


(123) 


where  a  =  2tt2ct2.  This  expression  approaches  a  limit  of  1/32  at  a  =  0,  and  a  second 
derivative  of  1/1024.  It  is  fairly  well  approximated  by  the  Gaussian  having  these 
same  properties, 

metric  ~  — exp  ( - —  |  .  (124) 

32  1  V  4v/2 )  1  ' 

Upon  substituting,  consecutively,  for  a,  a,  and  r,  we  get  the  metric  in  terms  of  slope, 

m,  which  is  given  by 


metric  =  exp 


/  n2D2(m  —  mi)2\ 
V  32V2A  s2  ) 


exp 


(m  —  mi)5 

2^G 


(125) 


where  a  =  4(21/4)As/7t/1,  and  m  is  the  slope  at  which  the  light  held  is  sheared  to 
produce  the  image.  The  multiplicative  factor  has  been  dropped,  since  only  relative 
scale  is  important. 
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Figure  58.  Sparrow  Resolvability  Criterion.  Under  the  sparrow  resolvability  criterion, 
two  peaks  are  considered  resolved  at  the  separation  which  produces  a  point  where  the 
first  and  second  derivatives  of  the  combined  curve  jointly  go  to  zero. 


Without  loss  of  generality,  we  let  one  object  be  located  at  =  0,  and  the  other 
be  located  some  interval  Am  away.  The  total  metric  is  then  given  by 

(  m2  \  (  (m-Am)2\ 

metric  =  Wx  exp  (  J  +  W2  exp  ( - — - J  .  (126) 


The  question  we  seek  to  investigate  is  how  close  the  two  objects  can  be  in  slope  space 
before  the  two  peaks  can  no  longer  be  resolved.  The  Sparrow  resolvability  criterion 
specifies  the  point  beyond  which  the  two  objects  will  no  longer  produce  two  distinct 
maxima  (the  point  at  which  the  intervening  minimum  disappears).  This  occurs  where 
both  the  first  and  second  derivative  of  the  combined  signal  are  simultaneously  zero. 
Fig.  58  gives  a  graphical  illustration. 

Unfortunately,  for  two  Gaussian  functions  of  unequal  size,  the  separation  that 
satisfies  this  criterion  cannot  be  solved  analytically.  For  the  case  where  W\  =  W 2 , 
the  criterion  is  satisfied  at 

8(21/4)As 

Am  =  2a  =  - — ^ - .  (127) 

ttD 
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Or,  in  terms  of  the  sampled  light  held  slope,  m  =  mAu/As , 


Am 


8(21/4) 

7tNu 


(128) 


Fig.  59  provides  an  experimental  assessment  of  this  expression,  using  the  Lytro  Light 
Field  Camera.  The  Lytro  camera  has  approximately  11x11  subpixels  per  microlens, 
and  thus  the  minimum  slope  separation  evaluates  to  Am  =  0.28.  In  the  experimental 
tests,  the  minimum  separation  for  two  objects  was  seen  to  be  two  or  three  times  this 
value.  A  piece  of  the  explanation  may  involve  the  fact  that  only  two  orthogonal  strips 
of  the  Fourier  transformed  light  held  were  used  in  forming  the  metric  due  to  speed 
considerations.  The  theory  and  experimental  results  do  agree  that  Fourier  domain 
ranging  is  not  highly  effective  at  distinguishing  objects  in  a  scene,  compared  to  other 
methods.  Some  possible  approaches  to  more  effective  Fourier  domain  ranging  would 
be  a)  to  split  up  the  light  held  into  spatial  segments  and  apply  the  method  separately 
to  each  segment,  or  b)  to  ht  a  Gaussian  to  the  most  prominent  peak  in  the  metric 
and  then  extract  its  contribution  in  order  to  see  if  any  other  peaks  are  made  visible. 


Camera  Calibration. 

The  relationship  between  light  held  slope  and  object  distance  is  given  by 

m  =  ( 1  -  L  +  hA  7.  (129) 

This  relationship  involves  the  main  lens  focal  length,  the  distance  from  the  main  lens 
to  the  microlens  plane,  and  implicitly  parameters  such  as  the  main  lens  diameter  and 
microlens  size.  This  research  was  performed  partly  with  a  camera  for  which  these 
parameters  were  not  known.  Thus,  the  relationship  between  light  held  slope  and 
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Figure  59.  Fourier  Ranging  Test:  An  experimental  assessment  of  the  ability  to  resolve 
objects  using  Fourier  domain  ranging.  The  minimum  resolvable  separation  is  several 
times  larger  than  predicted  by  the  simplified  model.  However,  both  results  indicate 
that  Fourier  domain  ranging  is  not  well  suited  to  resolving  details  about  a  scene  when 
applied  globally. 
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Figure  60.  Camera  Calibration  Target.  Checkered  grids  of  varying  size  were  used  so 
that  regions  of  the  pattern  would  remain  sharp  under  camera  MTF  effects  as  the  target 
grew  more  distant. 

object  distance  was  determined  empirically  by  imaging  a  target  (See  Fig.  60)  at  a  set 
of  known  distances. 

The  Fourier  domain  ranging  method  provides  a  convenient  approach  for  perform¬ 
ing  this  calibration  since  it  naturally  gives  the  distance  of  the  most  prominent  object 
in  a  scene,  and  there  is  no  need  to  attempt  the  removal  of  noisy  depth  estimates 
occurring  in  previously  discussed  methods  where  image  gradient  is  low. 

Fig.  61  shows  a  comparison  of  the  Fourier  domain  image  sharpness  metric  with 
a  number  of  alternative  metrics.  Metric  ^2  was  formed  by  spatially  refocusing  the 
image  and  then  taking  its  Fourier  transform.  Metric  #  3  is  simply  a  plot  of  the  max¬ 
imum  gradient  magnitude  contained  in  a  spatially  refocused  image.  The  methods 
show  good  agreement  concerning  the  slope  corresponding  to  maximum  image  sharp¬ 
ness.  The  Fourier  domain  metric  is  preferable  since  it  avoids  the  expensive  spatial 
refocusing  step  required  by  the  other  methods. 
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Sampled  Slope 


Figure  61.  Slope  Estimation  Results.  The  metric  employed  by  method  2  is  the  maxi¬ 
mum  gradient  magnitude  contained  in  the  region  of  interest.  Method  3  uses  a  weighted 
sum  of  spectral  intensities. 
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(C)  (d) 

Figure  62.  Camera  Calibration  Plots.  Slope  measurements  obtained  for  a  target  at 
increasing  distances  from  the  camera  are  shown  in  (a).  Plot  (b)  demonstrates  the 
linear  relationship  between  z~x  and  fh.  The  residuals  of  the  slope  estimation,  shown  in 
(c),  maintain  a  regular  magnitude  as  the  slope  changes.  However,  since  A s0  oc  s^Am, 
uncertainty  in  distance  estimation  does  not  remain  constant,  as  demonstrated  in  (d). 


Fig.  62  shows  the  results  of  a  100-point  calibration  using  a  constant  90px  by  30px 
spatial  region  of  the  light  held,  cropped  prior  to  Fourier  transformation.  Fig.  62b 
illustrates  the  linear  relationship  between  z ~1  and  the  quantized  slope,  fh,  defined 
in  Eq.  129.  The  fit  to  this  line  provides  the  information  needed  to  use  the  camera 
for  absolute  ranging.  The  magnitude  of  the  residual  between  this  fit  and  the  slope 
estimate  (See  Fig.  62c)  spans  a  similar  range  as  the  slope  changes.  However,  as 
object  distance  increases,  slope  estimation  errors  are  amplified,  such  that  the  depth 
estimation  error  increases  quadratically  with  distance,  as  seen  in  Fig.  62d. 
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4.6  Summary 


A  major  goal  of  this  chapter  was  to  develop  models  for  describing  the  behavior  of 
the  sampled  light  field  slope  uncertainty,  in  terms  of  attributes  of  the  sampled 
light  field,  such  as  the  number  of  subapertures  Nu,  the  image  gradient  g,  and  the 
degree  of  noise,  a.  Such  a  model  is  the  key  piece  of  a  generalized  uncertainty  model, 
since  the  sampled  light  field  slope  uncertainty  can  be  directly  translated  to  a  distance 
uncertainty  as  long  as  various  camera  parameters  are  known.  Though  the  chapter 
derives  several  such  analytic  models,  the  models  do  not  consistently  line  up  with 
empirical  uncertainties  calculated  using  synthetic  light  fields. 

In  the  case  of  feature  matching,  theoretical  modeling  indicates  that  aqm  should 
diminish  as  Nu  is  increased  clue  to  the  additional  samples  provided  to  the  simple 
linear  regression.  However,  the  observed  fall-off  is  weaker  than  expected.  Further 
investigation  is  needed  to  determine  if  the  discrepancy  results  from  faulty  assumptions 
made  about  the  nature  of  the  localization  error  of  the  feature  detector  (namely,  that 
it  is  normally  distributed). 

For  the  photo-consistency  method,  theoretical  modeling  indicates  that  aqm  should 
decrease  as  Nu  increases  clue  to  what  is  effectively  an  improvement  in  the  signal  to 
noise  ratio  of  the  photo-consistency  curve.  This  type  of  behavior  is  observed,  but  not 
in  a  manner  that  consistently  follows  the  analytic  model.  The  discrepancy  is  likely 
due  to  the  existence  of  a  behavior  not  accounted  for  by  the  analytic  model,  in  which 
the  mean  square  error  for  an  entire  light  field  becomes  coupled  to  the  size  of  the 
range  of  slopes  used  to  perform  the  initial  shearing  of  the  light  field.  To  achieve  more 
conclusive  demonstration  of  agreement  between  the  theoretical  model  and  empirical 
results,  it  will  be  necessary  to  avoid  this  coupling  or  to  find  a  meaningful  uncertainty 
metric  that  avoids  this  problem. 
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Finally,  for  the  case  of  Fourier  domain  ranging,  both  theoretical  and  empirical 
results  indicate  that  the  performance  of  this  method  is  much  inferior  to  that  of  the 
spatial  domain  methods.  Therefore,  this  method  is  not  of  further  interest  for  our 
purposes. 

In  the  absence  of  a  theoretical  model  that  consistently  describes  the  behavior  of 
the  sampled  slope  uncertainty,  the  empirical  results  provided  in  the  preceding 
sections,  and  summarized  in  Table  5,  can  be  scaled  to  provide  range  uncertainties  for 
an  arbitrary  camera.  The  table  gives  the  average  empirical  sampled  slope  uncertain¬ 
ties  yielded  by  the  feature  matching  and  photo-consistency  methods  for  the  case  of 
zero  noise  added  to  the  light  held.  In  viewing  the  table,  it  is  good  to  keep  in  mind 
that  the  number  of  depth  estimations  provided  by  the  SIFT  method  is  much  less  than 
that  generated  by  the  photo-consistency  method. 


Table  5.  Empirical  Values  of  Sampled  Slope  Uncertainty,  ar-n ,  for  Zero  Added  Noise. 


Method 

Nu  =  3 

Nu  =  5 

AC  =  7 

Nu  —  9 

SIFT  (Fig.  37a) 

0.100 

0.082 

0.077 

0.075 

SIFT  (Fig.  37b,  Changing  As) 

0.228 

0.118 

0.074 

0.069 

SIFT  (Fig.  37b,  Changing  A q) 

0.229 

0.133 

0.100 

0.074 

SIFT  (Avg) 

0.186 

0.111 

0.084 

0.073 

Photo- Consistency  (Fig.  50  ) 

0.122 

0.104 

0.099 

0.094 

Photo- Consistency  (Fig.  52) 

0.310 

0.178 

0.126 

0.094 

Photo-Consistency  (Avg) 

0.216 

0.141 

0.113 

0.094 

As  discussed  in  the  chapter  introduction,  the  sampled  light  held  slope  uncertainties 
provided  in  the  table  can  be  scaled  to  the  continuous  light  held  slope  uncertainties 
by 

As 

&m  =  O’fh/'Y  =  •  (130) 

A  u 

Applying  the  substitutions  As  =  NuAq  and  A u  =  D/Nu,  we  see  that 

°m  =  ~]jNua™-  (131) 
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Via  Eq.  64,  which  relates  the  continuous  slope  uncertainty  to  distance  uncertainty, 
the  distance  uncertainty  is  then  given  by 


crz 


l  D  Nu  71 


(132) 


Typically,  the  value  of  crryi  in  Table  5  does  not  fall  with  Nu  rapidly  enough  to 
counter  the  factor  in  this  equation.  For  this  reason,  a  choice  of  Nu  =  3  is  op¬ 
timal  because  it  minimizes  uncertainty  while  allowing  for  application  of  the  photo¬ 
consistency  method.  Since  lowering  Nu  while  maintaining  a  constant  detector  size  A q 
is  achieved  by  reducing  the  microlens  size  As,  this  also  has  the  advantage  of  a  higher 
spatial  sampling  rate  at  the  microlens  plane.  There  may  be  a  practical  limit  to  how 
small  Nu  may  become  under  the  traditional  plcnoptic  camera  framework.  If  this  is  the 
case,  the  focused  plcnoptic  camera  provides  a  viable  alternative,  since  it  can  mimic 
the  performance  of  a  traditional  plcnoptic  camera  while  using  larger  microlenses  (see 
the  equivalences  in  Table  3). 

Choosing  Nu  =  3  and  picking  a  value  of  ar-n  ~  0.1  from  the  table  gives  an  empirical 
range  uncertainty  formula,  which  will  be  explored  in  the  next  chapter: 


/-r  _ 

O  7  - - 


z20Aq 


lm  D 


(133) 
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V.  Plenoptic  Camera  Utility 


The  previous  chapter  provides  the  following  empirical  expression  for  the  range 
uncertainty  of  a  camera  with  Nu  =  3  subapertures. 

„  -o2  Ag 
Gz  ~  i  d 

This  chapter  will  use  this  equation  to  provide  an  assessment  of  the  applications  for 
which  plenoptic  camera  is  well  suited.  Strictly  speaking,  the  equation  relates  uncer¬ 
tainty  to  lm ,  the  distance  from  the  main  lens  plane  to  the  microlens  plane.  However, 
in  order  to  avoid  the  light  field  spreading  effects  discussed  in  section  3.4,  lm  must 
be  on  the  same  order  as  /,  the  main  lens  focal  length.  Thus,  we  represent  /  rather 
than  lm  as  the  limiting  factor  in  depth  uncertainty.  Uncertainty  is  then  reduced  by 
minimizing  pixel  size,  A q,  or  maximizing  the  focal  length,  /,  and  lens  diameter,  D. 

To  make  Eq.  134  easy  to  grasp,  its  results  are  visualized  in  Fig.  63  for  three  sample 
cameras.  The  camera  specifications  were  selected  to  nominally  represent  cameras  from 
three  different  application  regimes.  The  space-based  camera  has  dimensions  similar  to 
that  of  the  Hubble  space  telescope.  The  airborne  camera  is  sized  such  that  it  might 
feasibly  be  mounted  on  some  type  of  manned  aircraft  or  UAV.  The  ground-based 
camera  might  be  wielded  by  a  small  scale  robot  interacting  with  nearby  objects.  The 
space-based  camera  experiences  uncertainty  on  the  order  of  10  meters  at  a  distance 
of  50km.  The  airborne  camera  yields  comparable  uncertainty  at  a  distance  of  500 
meters.  The  robotic  camera  has  an  uncertainty  of  1  meter  for  a  point  around  25 
meters  away.  The  logarithmic  scale  in  Fig.  64  is  useful  for  assessing  the  performance 
of  each  camera  over  a  broader  range  of  distances.  For  more  general  purposes,  Fig.  65 
provides  a  nomograph  usable  for  determining  the  uncertainty  of  an  arbitrary  camera. 
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Figure  63.  Plenoptic  Camera  Performance  Regimes.  The  space-based  camera  expe¬ 
riences  uncertainty  on  the  order  of  10  meters  at  a  distance  of  50km.  The  airborne 
camera  yields  comparable  uncertainty  at  a  distance  of  500  meters.  The  robotic  camera 
has  an  uncertainty  of  1  meter  for  a  point  around  25  meters  away. 
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Figure  64.  Logarithmic  Scale  Uncertainties. 


Figure  65.  Uncertainty  Nomograph.  The  nomograph  can  be  used  to  determine  the 
ranging  uncertainty  for  a  camera  with  arbitrary  parameters.  The  paths  plotted  corre¬ 
spond  roughly  to  the  notional  cameras  presented  in  Fig.  63. 
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5.1  Remote  Sensing  Application 


The  modern  remote  sensing  landscape  is  outfitted  with  a  variety  of  solutions  for 
obtaining  depth  information.  Two  such  methods  include  aerial  photogrammetry  and 
lidar.  Photogrammetry  is  the  process  of  extracting  real-world  position  information 
from  images  of  objects.  In  the  era  of  film  photography,  stereo-plotters  enabled  cartog¬ 
raphers  to  identify  correspondences  between  overlapping  images  taken  by  an  airborne 
camera.  The  development  of  Lidar,  a  technology  which  analyzes  the  reflected  re¬ 
sponse  from  an  active  light  signal  to  determine  distance,  provided  advantages  over 
the  photogrammetric  process  both  in  terms  of  precision  and  automated  workflow. 
These  initial  advantages  appear  to  have  given  the  technology  a  large  user  base  within 
the  remote  sensing  community  [33]. 

Lidar  systems  achieve  accurate  depth  estimations  typically  by  measuring  the  time 
of  flight  of  a  laser  pulse  reflected  from  a  surface  back  toward  the  receiver.  Since  Lidar 
does  not  depend  on  parallax  in  obtaining  depth,  its  depth  resolution  performance  is 
largely  independent  of  distance,  as  long  as  the  laser  source  is  powerful  enough  and 
the  detector  sensitive  enough  for  the  signal  to  be  registered  after  propagating  the  full 
distance. 

In  spite  of  the  lead  taken  by  Lidar  within  the  remote  sensing  field,  a  combination 
of  factors  including  the  proliferation  of  high  resolution  digital  imagers,  the  advance¬ 
ment  of  the  graphics  processing  unit,  and  the  fruition  of  automated  feature  matching 
techniques  developed  within  the  computer  vision  community  have  brought  new  vi¬ 
tality  to  an  era  of  digital  photogrammetry,  in  which  3D  maps  can  be  generated  with 
comparable  accuracy  and  greater  efficiency  than  afforded  by  laser  scanning  techniques 
[33].  Due  to  the  greater  complexity  of  lidar  systems,  photogrammetry  also  tends  to 
offer  a  significant  cost  advantage. 
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Lidar  and  Photogrammetry  are  held  to  standards  of  accuracy  set  within  the  remote 
sensing  community.  The  United  States  National  Map  Accuracy  Standards  (NMAS) 
specify  tolerances  for  contour  maps  generated  from  3D  geographic  data,  [34],  This 
standard  specifies  that  “not  more  than  10  percent  of  the  elevations  tested  shall  be  in 
error  more  than  one-half  the  contour  interval.”  More  recent  standards  specify  stricter 
tolerances  [35],  and  typical  contour  intervals  range  from  0.5  feet  to  10  feet  [36].  Users 
of  Lidar  systems  are  typically  able  to  certify  their  results  to  the  1’  contour  interval 
NMAS  standard  [37]. 

Fig.  63  illustrates  that,  for  practical  airborne  and  orbital  altitudes,  this  level  of 
accuracy  is  not  attainable.  Simply  put,  since  the  plenoptic  camera  does  not  signifi¬ 
cantly  improve  upon  the  ranging  performance  afforded  by  a  stereo  system,  it  is  not 
surprising  that  it  is  not  a  suitable  candidate  for  3D  terrestrial  mapping  from  airborne 
and  orbital  platforms. 

5.2  Autonomous  Navigation 

Though  the  accuracy  afforded  by  plenoptic  camera  ranging  is  most  likely  not 
well-suited  to  terrestrial  mapping  applications,  the  camera  does  afford  the  sort  of 
accuracy  appropriate  for  the  task  of  autonomous  navigation.  This  is  not  surprising, 
since  the  performance  of  the  plenoptic  camera  is  demonstrated  within  this  thesis  to 
be  very  comparable  to  that  of  stereo  ranging  systems,  which  are  commonly  employed 
for  robotic  applications. 

As  a  monocular  system,  the  plenoptic  camera  provides  a  passive  ranging  option 
that  requires  a  minimal  amount  of  hardware  and  space.  Light  field  ranging  techniques 
such  as  the  photo-consistency  method  discussed  in  this  thesis  may  offer  advantages 
over  standard  approaches  in  terms  of  ease  of  implementation  and  computational  load. 
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These  advantages  make  the  plenoptic  camera  a  likely  choice  for  incorporation  on  a 
small,  autonomous  robotic  system 

Other  technologies  employed  within  this  regime  include  structured  light  scanning 
and  time  of  flight  cameras.  Though  these  technologies  are  often  able  to  provide 
better  accuracy  than  the  plenoptic  camera,  as  active  techniques  they  involve  increased 
complexity  and  expense.  Unlike  passive,  image-correspondence  based  systems,  these 
technologies  do  not  simultaneously  provide  depth  information  and  imagery.  The  need 
for  a  separate  camera  to  provide  imagery  further  increases  the  complexity,  bulk,  and 
expense  of  such  a  system. 

5.3  3D  Video 

The  ability  to  simultaneously  collect  imagery  and  depth  information,  with  poten¬ 
tial  for  real  time  processing,  all  with  a  single  aperture  camera,  makes  it  difficult  not 
to  imagine  easily  recording  3D  videos  with  a  plenoptic  camera.  Outside  of  the  realm 
of  entertainment,  this  technology  has  applications  that  will  probably  only  be  fully 
understood  as  it  matures  and  proliferates. 

As  technologies  for  displaying  3D  video  reach  greater  maturity,  it  seems  likely  that 
3D  video  will  come  to  play  a  stronger  role  in  allowing  intelligence  analysts  or  central 
command  headquarters  to  receive  a  better  understanding  of  a  tactical  situation  via 
3D  cameras  deployed  to  the  site  of  operations.  This  will  lead  to  an  increased  demand 
for  devices  capable  of  capturing  3D  video  at  minimal  cost  and  operational  difficulty. 
The  plenoptic  camera  stands  alongside  other  technologies,  like  stereoscopic  cameras, 
in  a  position  to  fulfill  this  need. 

Fig.  63  indicates  that  a  plenoptic  camera  mounted  on  board  a  UAV  would  likely 
yield  only  coarse  landscape  depth  information  when  operating  at  typical  altitudes. 
In  contrast,  a  hand-held  or  helmet-mounted  camera  might  more  easily  succeed  at 
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providing  meaningful  3D  interaction  with  objects  within  10  to  20  meters  of  the  ob¬ 
server.  Security  cameras  of  this  scale  might  also  provide  additional  detail  sufficient 
to  improve  facial  identification. 

5.4  Future  Development 

In  the  near  future,  the  plenoptic  camera  appears  likely  to  find  its  most  comfortable 
applications  in  the  domains  of  small  scale  autonomous  robots,  hand-held  3D  video 
recording,  and  any  other  systems  requiring  passively-obtained  depth  information  for 
close  ranges.  Various  advancements  stand  to  extend  this  application  space. 

Advancements  in  the  availability  of  cheap,  light-weight,  large-diameter  optics  may 
play  a  part  in  making  larger  scale  plenoptic  range  cameras  practical.  Eq.  134  indicates 
that  increasing  the  diameter  of  the  camera’s  aperture  leads  to  significant  improvement 
in  depth  resolution  ability.  The  plenoptic  camera  may  assist  in  this  effort  by  allowing 
for  computational  correction  of  optical  aberrations.  Aberrations  tend  to  affect  how 
different  regions  of  a  collecting  lens  map  object  points  to  the  imaging  plane,  resulting 
in  a  broadened  point  spread  function  for  the  system.  By  separately  collecting  light 
from  different  aperture  regions,  the  plenoptic  camera  allows  for  the  mapping  from 
each  subaperture  to  be  separately  modified,  in  order  to  tighten  the  overall  point 
spread  function.  The  demonstration  of  this  capability  is  presented  in  [4]. 

Future  iterations  of  the  plenoptic  camera  may  no  longer  use  microlenses  to  achieve 
angular  sensitivity.  Angularly  sensitive  pixels  have  been  demonstrated  which  use 
stacked  gratings  to  selectively  transmit  light  [38].  Other  sensor  designs  integrate 
optical  elements  directly  into  the  detector  [39].  The  focused  plenoptic  camera  model 
allows  for  gaps  between  the  apertures  to  exist  without  resulting  in  spatial  sampling 
gaps.  These  gaps  in  turn  allow  for  a  sensor  design  which  enables  smaller  pixel  sizes 
than  is  otherwise  achievable,  though  at  the  cost  of  a  lower  SNR  [39].  Any  of  these 
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technologies,  by  yielding  an  angularly-sensitive  pixel  smaller  than  the  combination 
of  pixels  and  lenses  employed  within  lenslet  based  plenoptic  cameras,  could  feasibly 
provide  a  significant  improvement  to  range  performance. 
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VI.  Conclusion 


6.1  Contributions 

This  thesis  contains  a  number  of  contributions  to  the  literature  relating  to  plenop- 
tic  cameras.  At  the  level  of  general  plcnoptic  imaging,  the  thesis  improves  upon  pre¬ 
vious  descriptions  of  the  role  of  diffraction  within  a  plenoptic  camera,  as  in  [4],  In 
particular,  the  effects  of  diffraction  are  described  comprehensively  and  quantitatively, 
and  it  is  shown  that  it  is  impossible  to  achieve  critical  sampling  for  all  four  dimension 
of  the  light  field.  Again,  with  respect  to  plcnoptic  imaging,  the  thesis  provides  a 
link  between  the  traditional  plcnoptic  camera  [4],  and  the  ‘focused’  plcnoptic  camera 
[24],  eliminating  any  notion  of  a  fundamental  disparity  in  the  capabilities  of  the  two 
approaches. 

Finally,  the  thesis  provides  novel  analytic  models  for  describing  the  uncertainty 
in  the  plcnoptic  camera’s  ranging  uncertainty  for  a  number  of  estimation  frame¬ 
works.  For  example,  though  depth  estimation  using  the  photo-consistency  technique 
described  here  has  been  previously  demonstrated  in  [7],  no  characterization  of  the 
estimation  uncertainty  is  provided.  Again,  plcnoptic  rangefinding  within  the  Fourier 
domain  is  described  in  [12]  absent  of  any  model  for  range  uncertainty.  This  thesis 
supplies  these  techniques  with  models  which  describe  the  scale  of  the  uncertainty  to 
be  expected,  as  well  as  the  behavior  of  the  uncertainty  as  various  camera  param¬ 
eters  of  varied.  The  outcome  of  these  models  is  a  recommendation  concerning  the 
camera  construction  yielding  the  best  overall  performance  for  the  purposes  of  range 
estimation. 
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6.2  Future  Work 


Within  the  realm  of  plcnoptic  ranging  itself,  much  can  be  done  and  has  been  done 
beyond  the  techniques  described  within  this  thesis.  However,  the  comments  here  will 
be  limited  to  work  that  might  be  done  to  improve  upon  the  uncertainty  modeling 
which  constituted  the  primary  task  of  this  thesis.  Improvements  in  this  regard  might 
conceivably  take  two  forms. 

First,  none  of  the  models  presented  in  this  thesis  deal  adequately  with  the  impact 
of  sampling  on  the  image  characteristics  which  ultimately  relate  to  ranging  accuracy. 
The  model  dealing  with  accuracy  in  the  context  of  feature  matching  makes  no  attempt 
to  describe  the  behavior  of  the  localization  error  of  the  feature  detecting  algorithm. 
For  the  photo-consistency  method,  uncertainty  is  modeled  in  terms  of  the  local  gra¬ 
dient  strength  of  the  light  field.  However,  there  is  no  straightforward  relationship 
between  a  gradient  within  a  continuous  image  and  the  gradient  within  a  sampled  or 
blurred  image.  Rather,  the  effects  of  sampling  and  blurring  are  best  understood  with 
respect  to  image  spatial  frequencies.  For  a  more  robust  uncertainty  characterization, 
it  would  be  necessary  either  to  find  a  better  description  of  the  behavior  of  image  gra¬ 
dients  under  resampling,  or  to  formulate  uncertainty  in  terms  of  the  spatial  frequency 
content  of  an  image. 

A  different  possibility  would  be  to  take  a  more  empirical  approach  to  uncertainty 
by  using  a  large  sample  set  with  a  variety  of  estimation  methods  to  create  a  database 
describing  the  performance  of  different  approaches  in  various  scenarios.  Both  efforts 
would  be  aided  by  an  improved  plenoptic  camera  simulation  framework.  The  syn¬ 
thetic  light  fields  used  within  this  research  are  suspect  because  they  arise  from  an 
approach  which  generates  subaperture  images  using  simulated  ‘pinhole’  cameras.  Sec¬ 
tion  3.3  illustrates  that  subaperture  images  will  not  always  have  the  depth  of  field 
characteristics  of  such  a  ‘pinhole  image.’  An  improved  simulation  framework  might 
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also  include  diffraction  and  optional  lens  aberrations.  Finally,  a  simulation  employ¬ 
ing  accurate  radiometry  would  assist  correctly  modeling  signal  to  noise  ratio,  which  is 
demonstrated  in  this  thesis  to  strongly  effect  ranging  performance.  A  simulation  in¬ 
corporating  all  of  these  elements  would  be  crucial  in  any  effort  to  create  a  meaningful 
database  of  range  estimation  performance. 

6.3  Final  Remarks 

The  plenoptic  camera  is  a  milestone  device  which  promises  to  help  usher  in  a  new 
era  of  computational  photography.  The  plenoptic  camera’s  striking  ability  to  render 
refocused  images  stems  from  the  fact  that  it  samples  the  4D  radiance  distribution  at 
its  detector  plane,  rather  than  the  2D  intensity  distribution  sampled  by  conventional 
cameras.  When  this  distribution  is  formulated  as  a  light  field,  a  sequence  of  shearing 
and  projecting  the  light  field  imitates  the  image  formation  process  of  a  conventional 
camera  with  adjustable  focal  length. 

The  light  field  is  distinguished  from  data  sets  collected  by  stereoscopic  systems 
because  it  contains  images  obtained  by  an  N  by  N  grid  of  apertures,  rather  than  just 
the  two  apertures  of  the  stereoscopic  system.  Though  these  additional  views  enable 
the  camera  to  perform  novel  functions  like  the  generation  of  refocused  imagery,  it  is 
not  clear  that  they  provide  a  significant  advantage  in  terms  of  depth  resolution. 

Though  theoretical  considerations  within  this  paper  indicate  that  increasing  the 
angular  sampling  density  of  the  camera,  all  other  things  fixed,  should  result  in  a  better 
rangefinding  accuracy,  experimental  results  indicate  that  the  improvement  may  be 
fairly  minimal.  This  means  that,  when  there  is  a  choice  between  spatial  and  angular 
resolution,  as  in  the  tradeoff  induced  by  varying  the  microlens  size  in  a  conventional 
plenoptic  camera,  it  is  typically  desirable  to  maximize  spatial  resolution. 
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Though  the  effectiveness  of  light  field  ranging  techniques  compares  closely  to  that 
of  techniques  employed  in  stereoscopic  computer  vision,  the  plenoptic  camera  may  still 
offer  practical  advantages  in  terms  of  its  small  footprint,  low  cost,  and  minimal  need 
for  calibration.  At  its  present  level  of  development,  the  plenoptic  camera  fits  nicely 
into  an  application  space  that  includes  robotic  navigation,  3D  video  recording,  and 
security  monitoring.  This  application  space  may  continue  to  expand  as  developing 
technologies  allow  the  camera  to  achieve  acceptable  accuracy  at  greater  ranges. 
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Appendix  A.  Projection  Slice  Theorem 


In  this  appendix,  we  show  that  the  sequence  of  shearing,  projecting,  and  Fourier 
transforming  the  light  field  is  equivalent  to  the  sequence  of  Fourier  transforming, 
shearing,  and  slicing.  We  confine  the  math  to  the  two  dimensions  s  and  u.  However, 
extension  to  the  full  4D  case  is  straightforward.  For  this  appendix,  we  drop  the 
convention  of  representing  normalized  coordinates  with  over-bars.  The  goal  is  to 
show  the  following  equality: 

(TToVoB,m){L{*))  =  (5o^Tojr2)[f(x)].  (135) 

The  various  operators  employed  are  defined  in  Table  6. 

Eq.  135  is  demonstrated  by  simply  following  the  two  sequences  of  operations  and 
manipulating  the  result  to  show  equivalence.  The  first  consists  of  shearing,  projecting, 
and  taking  the  Fourier  transform.  The  shearing  operator  is  applied  first: 

#m[L(x)](x)  =  =  L(s  +  fhu,u) .  (136) 

This  is  followed  by  projection, 

Nu- 1 

(V  O  Brn)  [T(x)j  (s)  =  ^  L  (s  +  mu,  u ) ,  (137) 

u= 0 

and  finally,  a  one- dimensional  Fourier  transform, 

Ns  1  Nu  1  /  7  / \ 

(FToPo4)[L(x)](fcs)  =  EE  L(s'  +  mu,  u)  exp  (  —27ri-^—  J  .  (138) 

s'=0  u= 0  k  s  ' 
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Table  6.  Operator  Definitions 


Description  Symbol  Definition 


ID  Fourier  Transform  JrT[/(s)](/cs) 

Projection  V[f(x.)](s) 

Shear  £>[/(x)](x) 

Slice  S[f(k)](ks) 

Modified  Shear  H[/(x)](x) 

2D  Fourier  Transform  JrT2[A'(x)](k) 


Ns~ 1  r  /  ,s  V 

y  /(s)ex p  ~27ii  (ks]\rJ 

nu-i 

y 

u=0 

/(B-'x)  B«=  J  “j" 

f(ks,0) 


f(B -'x) 


1  - mNu/Ns 

0  1 


JV3-1  JV„-1 

EE  iF(s,  w)  exp 


s=0  it=0 


We  use  the  substitution  s  =  s'  +  mw  to  slightly  alter  the  form  of  the  equation: 

Ns-1  Nu  —  1+mti  /  lc  \ 

{TT  oV  o  J3rn)[L(x)](ks)  =  y  y  L(s,  u)  exp  (—2niy(s  —  mu)  \  .  (139) 

u= 0  s=fhu  x  ' 

The  second  route  is  to  take  the  2D  Fourier  transform  of  the  light  field,  and  then 
to  extract  a  slice  at  the  correct  angle.  Here,  angled  slicing  is  represented  by  applying 
the  modified  shear  operator  ( B ),  and  then  taking  the  central  angular  slice.  First,  we 
apply  to  2D  Fourier  transform: 

Ns- 1  Nu-1 

G(k)  =  Jzrr2[T(x)](k)  =  y  y  L(s,  u)  exp 

s=0  u= 0 

Next,  the  modified  shearing  operator: 

(BTTo_Fr2)[L(x)](k)  =  B«T[G(k)](k)  =  G  (C k)  =  G  (k„ +  ku)  .  (141) 


yy+ky)- 
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Finally,  the  slicing  operator  sets  ku  to  zero: 


(5oB-ro77")[L(x)](fc.) 


G  ^ks,  -m^k^j 

Na-1  Nk-  1 

=  Y1  L(s>u)exp 

s= 0  w=0 


(s  —  mu) 


(142) 


Comparison  reveals  that  Eqs.  139  and  142  are  identical,  save  a  minor  difference 
in  the  limits  of  the  summation  in  s. 
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